By Jordan Galczynski on April 16, 2026
Updates April 2026
- Added language selection feature to allow users to select the language(s) that appear in the PDF to assist in accuracy of OCR
- Accessibility report feature that includes instructions to users on how to fully remediate their PDF’s beyond OCR
In an effort to meet web accessibility standards, HumTech has developed a new AI-powered tool currently called “Repair PDF,” where previously scanned PDFs are rebuilt by adding a searchable text layer while preserving the original layout.
This is an essential and important first step in making your PDFs fully accessible (see below for more information on PDF remediation). A division-wide Ally report reveals the scale of the challenge: across Humanities courses in BruinLearn, 14.4% of PDFs — over 15,500 files — are completely unreadable with no OCR applied. An additional 4.5% (about 4,800 files) have been OCR’d, but Ally flagged the quality as inadequate. Altogether, roughly 1 in 5 PDFs have an OCR-related issue, totaling approximately 20,300 files that need attention.
Repair PDF is a hands-off service: rather than asking faculty to fix documents one at a time on their own machines, you send us the files and we run them through the pipeline. The pipeline does more than OCR alone — in a single pass it adds a searchable, screen-reader-accessible text layer without altering the original page images, optimizes the file (often producing smaller sizes with improved contrast and clarity), converts to PDF/A for long-term archival stability, and performs minor structural cleanup on the PDF itself.
On clearly printed text in supported languages, accuracy is very high — under 1% character error rate in favorable conditions. Results depend on the quality of the original scan, and some material remains genuinely difficult: handwritten manuscripts, heavy mathematical notation, non-Latin scripts with limited language support, and severely faded or skewed scans may produce output that needs review or manual correction.
While we’re hoping to make this a self-service tool in the near future, while we work out some details and add improvements, we’re happy to start using it on your behalf. To submit your documents for OCR, please email ritc@humnet.ucla.edu. We are limiting the process to five PDF’s per request, with an expected return time of 48 hours.
This is only the first step in a tool HumTech is designing to aid with the remediation process. Planned future improvements include:
- bulk alt-text description
- Markdown and other web-ready text output, so document content can be embedded directly into course sites and other web pages
- expanded OCR support for harder document types, such as handwritten manuscripts and materials with complex mathematical typesetting
What is Remediation?
Remediation entails not just making a PDF “readable” through Optical Character Recognition (OCR), but also telling a screen reader _how_ to read a PDF. This is known as semantic mapping, which tells a screen reader or other accessibility device the order in which to read a document. The core components of semantic mapping include tagging elements of a document, providing alt-text and descriptive hyperlinks, and ensuring high color contrast.
Tool development and design by
Antti Hiltunen, software developer
Technical Details
Currently supported printed languages include Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Azerbaijani – Cyrillic, Basque, Belarusian, Bengali, Bosnian, Breton, Bulgarian, Burmese, Catalan; Valencian, Cebuano, Central Khmer, Cherokee, Chinese – Simplified, Chinese – Traditional, Corsican, Croatian, Czech, Danish, Danish – Fraktur (contrib), Dutch; Flemish, Dzongkha, English, English, Middle (1100-1500), Esperanto, Estonian, Faroese, Filipino (old – Tagalog), Finnish, French, French, Middle (ca.1400-1600), Galician, Georgian, Georgian – Old, German, German – Fraktur (contrib), German – Fraktur (now deu_latf), German (Fraktur Latin), Greek, Ancient (to 1453) (contrib), Greek, Modern (1453-), Gujarati, Haitian; Haitian Creole, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Inuktitut, Irish, Italian, Italian – Old, Japanese, Javanese, Kannada, Kazakh, Kirghiz; Kyrgyz, Korean, Korean (vertical), Kurdish (Arabic Script), Kurmanji (Kurdish – Latin Script), Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Maltese, Maori, Marathi, Math / equation detection module, Mongolian, Nepali, Norwegian, Occitan (post 1500), Oriya, Orientation and script detection module, Panjabi; Punjabi, Persian, Polish, Portuguese, Pushto; Pashto, Quechua, Romanian; Moldavian; Moldovan, Russian, Sanskrit, Scottish Gaelic, Serbian, Serbian – Latin, Sindhi, Sinhala; Sinhalese, Slovak, Slovak – Fraktur (contrib), Slovenian, Spanish; Castilian, Spanish; Castilian – Old, Sundanese, Swahili, Swedish, Syriac, Tagalog (new – Filipino), Tajik, Tamil, Tatar, Telugu, Thai, Tibetan, Tigrinya, Tonga, Turkish, Uighur; Uyghur, Ukrainian, Urdu, Uzbek, Uzbek – Cyrillic, Vietnamese, Welsh, Western Frisian, Yiddish, Yoruba
Currently supported scripted languages include Arabic, Armenian, Bengali, Canadian_Aboriginal, Cherokee, Cyrillic, Devanagari, Ethiopic, Fraktur, Georgian, Greek, Gujarati, Gurmukhi, HanS (Han simplified), HanS_vert (Han simplified vertical), HanT (Han traditional), HanT_vert (Han traditional vertical), Hangul, Hangul_vert (Hangul vertical), Hebrew, Japanese, Japanese_vert (Japanese vertical), Kannada, Khmer, Lao, Latin, Malayalam, Myanmar, Oriya(Odia), Sinhala, Syriac, Tamil, Telugu, Thaana, Thai, Tibetan, Vietnamese
Additional resources
UCLA Teaching and Learning Center – Making your Course Accessible
UCLA Humanities Technology – How to make your website Accessible
DTS- Digital Accessibility
