Copy link to clipboard
Copied
Hi all,
My goal is to try to scan multiple PDF's so they are searchable and I can extract information through Python in images and text within the PDF's. I'm facing multiple problems which mostly revolve around either the text not showing up in a PDF "Ctrl+F" search, or Python reporting to me that the text is not there. I've tried all the below described methods without many results on the attached file (attached file is file I downloaded before running scans). My questions are below which should help me debug the issues to see what is user error on my part:
1. What is the best program to enhance &/or scan PDF's? (Adobe Acrobad Reader, Adobe Scan, Adobe Acrobat DC, etc.)
- Maybe i'm not understanding the difference between these, or if they are all apart of the 'Adobe Acrobat Pro' package.?2. What is the best method to scan text so it is searchable?
2A - I'm trying to use the 'Scan & OCR' tool in Adobe Acrobat Standard and see you can select 'Enhance' -> 'Scanned Document', or select 'Recognize Text' -> 'In This File'. Should I do one before the other to maximize efficacy?
2B - They both seem to recognize text, so i'm confused on the difference. What are the benefits of one over the other?
2C - Should I use them together if i'm still not getting results?
3. Even after running both these optinos in 'Scan & OCR' at 600 dpi, this document doesn't have most of it's text searchable (see attached doc). What more can I do besides running both these options in "Scan & OCR"?
Thank you in advance for all the help!
Copy link to clipboard
Copied
Hi there
Hope you are doing well and sorry for the delay.
Acrobat Reader and Scan are all part of the 'Adobe Acrobat Pro' package for more info. please check the help page https://adobe.ly/4cicTii
The Enhance Camera Images feature helps clean up images that are captured using smart phone cameras. Using this feature, you can click photographs of a document from your mobile devices and then create a nice-looking, clear, and small-size PDF. It solves your need of ad-hoc scanning without using a standard scanner https://adobe.ly/3vigs7C
The recognize text is user for language OCR, By default the OCR language is picked from default locale. To change the language, click Edit and choose a different language https://adobe.ly/4cmLnQy
What happens when you try to Scan and OCR the text on the PDF file, do you get any error message, if yes, please share the screenshot of the same for more clarity, also check if its an issue with the specific PDF or all of them.
Also make sure you have the application updated the recent version 24.01.20604 installed. Go to Help > Check for updates and reboot the computer once.
~Amal