How to maintain OCR when converting from PDF to Word (moved)

Report · Jul 27, 2023

(moving this from Adobe Reader to Acrobat Reader to Acrobat)

Hello everyone,

I have PDFs that contain OCR information from the scanning process. When I open the PDF files in Adobe Acrobat 2017, the text is selectable and I can copy & paste everything. However, when I convert these files to Word - using Adobe Acrobat 2017 - the reslusting Word often contain images instead of editable text. I assume that this has to do with grey or colored backgrounds in some cases and with poor printing quality (of the scanned documents) in others.

Does Adbone Acrobat 2017 do a completely new OCR when converting to Word instead of using the existing OCR data in the PDF file? If so, is there a way to make it use the existing OCR data?

Also, weirdly, repeating the conversion process leads to different results. Sometimes the result is mostly or partially editable and something it's mostly images or only one page-sized image.

I have some PDFs that are converted perfectly, and so far they differ in two ways:

1. The file details show a different "PDF-Version: 1.7, Adobe Extension Level 5 (Acrobat 9.x)" - the files that don't convert properly show "PDF-Version: 1.4 (Acrobat 5.x).

2. The files have a mostly white background and are of slightly better image quality.

Thanks a lot in advance for any help!

Best regards,

Marcel

Report · Jul 27, 2023

Hi Marcel,

Thank you for reaching out.

Would like to inform you that Acrobat 2017 is an old and unsupported version. For more information, please refer to the following help document: https://helpx.adobe.com/acrobat/kb/end-of-support-acrobat-2017-reader-2017.html.

However, the file should convert as a text file if the OCR is done on the PDF before converting.

If you are experiencing issues with a particular PDF, you may share the PDF with us so we can replicate the behavior on our end.

Thanks,

Meenakshi

Report · Jul 27, 2023

Hello Meenakshi,

Thank you for replying so quickly.

Please use the attached PDF file. You can copy paste the content - from Adobe Acrobat Pro 2017 to somehwere else - and it will work fine. When I use "export to word" in Adobe Acrobat Pro 2017, I get the result I also attached. Looks like Acrobat is doing a new OCR instead of using the text data in the PDF file - and the result is bad.

Best regards,

Marcel