Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

OCR and text extraction

New Here ,
Jun 24, 2016 Jun 24, 2016

Hello

I work in a library service providing core readings to print disabled students in alternative (more accessible) formats. We've recently started using Adobe Pro DC via the institution to add accessibility features to PDFs received from publishers or convert them to Word.

What are the quickest and guaranteed ways to determine if a PDF has OCR already?

Secondly is there a way to extract the content of a PDF without OCR? When we were using Pro XI we were having some success in determining if a PDF had been created from a Word document and saving it back to Word resulted in fewer recognition errors.

Any tips gratefully received.

JK

TOPICS
Acrobat SDK and JavaScript , Windows
623
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jun 28, 2016 Jun 28, 2016
LATEST

Using Acrobat Pro you can configure a Preflight to assess the PDF.

"Without OCR" - OCR presupposes the PDF's page content is the output of a scanner. That'd be an image.

An image has no text, columns, styles, etc., etc. - So none of that is available for export. What can be exported is the image (to an Acrobat supported image file format).

To export to TXT, RTF, DOC or DOCX the has to be renderable text (actual font encoding information that maps to Unicode).

You get that via OCR eh.  Note that "outlined" fonts are not fonts per se - just a fancy/pretty graphic rendition of a font's glyph. Can't OCR those.

If the publishers are sending you PDFs without renderable text then clearly they don't want repurposing of content eh.

Maybe you can get the topics of such content from another source that supports repurpose more readily.

Be well...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines