Copy link to clipboard
Copied
Hello
I work in a library service providing core readings to print disabled students in alternative (more accessible) formats. We've recently started using Adobe Pro DC via the institution to add accessibility features to PDFs received from publishers or convert them to Word.
What are the quickest and guaranteed ways to determine if a PDF has OCR already?
Secondly is there a way to extract the content of a PDF without OCR? When we were using Pro XI we were having some success in determining if a PDF had been created from a Word document and saving it back to Word resulted in fewer recognition errors.
Any tips gratefully received.
JK
Copy link to clipboard
Copied
Using Acrobat Pro you can configure a Preflight to assess the PDF.
"Without OCR" - OCR presupposes the PDF's page content is the output of a scanner. That'd be an image.
An image has no text, columns, styles, etc., etc. - So none of that is available for export. What can be exported is the image (to an Acrobat supported image file format).
To export to TXT, RTF, DOC or DOCX the has to be renderable text (actual font encoding information that maps to Unicode).
You get that via OCR eh. Note that "outlined" fonts are not fonts per se - just a fancy/pretty graphic rendition of a font's glyph. Can't OCR those.
If the publishers are sending you PDFs without renderable text then clearly they don't want repurposing of content eh.
Maybe you can get the topics of such content from another source that supports repurpose more readily.
Be well...
Find more inspiration, events, and resources on the new Adobe Community
Explore Now