Adobe Acrobat reader pro question: retrieve 100% of all text OCR recognized

Question

LS,I’ve noticed an issue when exporting OCR results from a PDF using Adobe Acrobat Reader Pro: some text appears to be missing in the exported file. The PDF contains both regular text and images with embedded text. While performing a search within Adobe Acrobat Reader Pro (when OCR processing is done), I can find character strings detected in the images that do not appear in the exported text file. So OCR has recognized these characters but they have not been exported. Based on my research, it seems that the only way to extract all characters recognized by Adobe Acrobat’s OCR process is to combine its functionality with a Python script that uses PyMuPDF. Could you please confirm whether this conclusion is correct? Thank you in advance for your assistance. Best regards,Kees Besse

try67 · Answer

That should not be happening. Can you share a sample file? Also, how exactly are you exporting the text from Acrobat? (not Acrobat Reader Pro... There's no such thing and Reader can't perform OCR)

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.