LS,
I’ve noticed an issue when exporting OCR results from a PDF using Adobe Acrobat Reader Pro: some text appears to be missing in the exported file. The PDF contains both regular text and images with embedded text. While performing a search within Adobe Acrobat Reader Pro (when OCR processing is done), I can find character strings detected in the images that do not appear in the exported text file. So OCR has recognized these characters but they have not been exported.
Based on my research, it seems that the only way to extract all characters recognized by Adobe Acrobat’s OCR process is to combine its functionality with a Python script that uses PyMuPDF.
Could you please confirm whether this conclusion is correct?
Thank you in advance for your assistance.
Best regards,
Kees Besse