Best method to scan text in PDF
- March 4, 2024
- 1 reply
- 1036 views
Hi all,
My goal is to try to scan multiple PDF's so they are searchable and I can extract information through Python in images and text within the PDF's. I'm facing multiple problems which mostly revolve around either the text not showing up in a PDF "Ctrl+F" search, or Python reporting to me that the text is not there. I've tried all the below described methods without many results on the attached file (attached file is file I downloaded before running scans). My questions are below which should help me debug the issues to see what is user error on my part:
1. What is the best program to enhance &/or scan PDF's? (Adobe Acrobad Reader, Adobe Scan, Adobe Acrobat DC, etc.)
- Maybe i'm not understanding the difference between these, or if they are all apart of the 'Adobe Acrobat Pro' package.?2. What is the best method to scan text so it is searchable?
2A - I'm trying to use the 'Scan & OCR' tool in Adobe Acrobat Standard and see you can select 'Enhance' -> 'Scanned Document', or select 'Recognize Text' -> 'In This File'. Should I do one before the other to maximize efficacy?
2B - They both seem to recognize text, so i'm confused on the difference. What are the benefits of one over the other?
2C - Should I use them together if i'm still not getting results?
3. Even after running both these optinos in 'Scan & OCR' at 600 dpi, this document doesn't have most of it's text searchable (see attached doc). What more can I do besides running both these options in "Scan & OCR"?
Thank you in advance for all the help!
