Copy link to clipboard
Copied
I have thousands of scanned documents, and I need to know which ones have optical character recognition (OCR) and which ones don't.
I don't have the ability to open each document by itself to check.
I have ABBYY FineReader 16
Apparently there is such an option, but I couldn't find where to do it.
Is there any way to check all my scanned documents and tell me which ones do not have OCR?
Or is there any such software in the world at all?
Copy link to clipboard
Copied
With Adobe Acrobat you can check for searchable text.
Copy link to clipboard
Copied
Maybe I didn't explain myself correctly I know that I can tell with Adobe Acrobat that I have searchable texts.
But I need something completely different.
I need to know which document out of my 10000 documents has no searchable text.
How can I know this without opening each document by itself to check?
Copy link to clipboard
Copied
You get also the documents which doesn't have searchable text.
Copy link to clipboard
Copied
The two things are the same... If a document has more than zero words in it, then it has searchable text, and vice versa.
To identify that you would need to use a script. You can run it as a part of an Action (via Tools - Action Wizard) to generate a list of all the files that don't have any searchable text in them.
You can use the following code for that:
var numWords = 0;
for (var p=0; p<this.numPages; p++) {
var numWords+=this.getPageNumWords(p);
if (numWords>0) break;
}
if (numWords==0) console.println(this.path);
When the Action finishes executing press Ctrl+J and you'll see a list of all the file paths without any text in them from those you scanned.
Edited: Fixed small mistake in the code
Copy link to clipboard
Copied
PS. You would have to have Acrobat Pro to do that. You can't do it with the free Reader.
Copy link to clipboard
Copied
Thank you very much for trying to help me.
Perhaps additional clarification is needed
I have Acrobat Pro but it didn't work.
This is not done automatically
But at the end of scanning each file you have to order to continue.
And at the end of the whole operation, I got nothing with Ctrl+J
Copy link to clipboard
Copied
> I have Acrobat Pro but it didn't work.
- Didn't work in what way, exactly?
> This is not done automatically
- No, you have to run it manually, but it will process all the files you selected one after another.
Acrobat is not built for anything else. You would need a stand-alone application for a completely automated solution. Such tools are also more robust than Acrobat and can process more files (without displaying them), much faster.
> But at the end of scanning each file you have to order to continue.
- Make sure to untick the "Prompt User" check-box under the "Execute JavaScript" command in the Action.
> And at the end of the whole operation, I got nothing with Ctrl+J
- Then all of your files have (some) text in them.
Copy link to clipboard
Copied
I have ABBYY FineReader 16
Apparently there is such an option, but I couldn't find where to do it.
By @arye anak
Ask in an ABBYY forum. There is no metadata flag in aPDF file that conveys such an information.