How to maintain OCR when converting from PDF to Word

Question

Hello everyone,

I have PDFs that contain OCR information from the scanning process. When I open the PDF files in Adobe Acrobat 2017, the text is selectable and I can copy & paste everything. However, when I convert these files to Word - using Adobe Acrobat 2017 - the reslusting Word often contain images instead of editable text. I assume that this has to do with grey or colored backgrounds in some cases and with poor printing quality (of the scanned documents) in others.

Does Adbone Acrobat 2017 do a completely new OCR when converting to Word instead of using the existing OCR data in the PDF file? If so, is there a way to make it use the existing OCR data?

Also, weirdly, repeating the conversion process leads to different results. Sometimes the result is mostly or partially editable and something it's mostly images or only one page-sized image.

I have some PDFs that are converted perfectly, and so far they differ in two ways:

1. The file details show a different "PDF-Version: 1.7, Adobe Extension Level 5 (Acrobat 9.x)" - the files that don't convert properly show "PDF-Version: 1.4 (Acrobat 5.x).

2. The files have a mostly white background and are of slightly better image quality.

Thanks a lot in advance for any help!

Best regards,

Marcel

Bernd Alheit · Accepted Answer

Try the forum for Adobe Acrobat.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded