Skip to main content
Participant
December 21, 2023
Question

Write to me which document does not have optical character recognition (OCR).

  • December 21, 2023
  • 2 replies
  • 1203 views

I have thousands of scanned documents, and I need to know which ones have optical character recognition (OCR) and which ones don't.
I don't have the ability to open each document by itself to check.
I have ABBYY FineReader 16
Apparently there is such an option, but I couldn't find where to do it.
Is there any way to check all my scanned documents and tell me which ones do not have OCR?
Or is there any such software in the world at all?

This topic has been closed for replies.

2 replies

Abambo
Community Expert
Community Expert
December 22, 2023
quoteI have ABBYY FineReader 16
Apparently there is such an option, but I couldn't find where to do it.

By @arye anak

Ask in an ABBYY forum. There is no metadata flag in aPDF file that conveys such an information.

ABAMBO | Hard- and Software Engineer | Photographer
Bernd Alheit
Community Expert
Community Expert
December 21, 2023

With Adobe Acrobat you can check for searchable text. 

arye anakAuthor
Participant
December 21, 2023

Maybe I didn't explain myself correctly I know that I can tell with Adobe Acrobat that I have searchable texts.
But I need something completely different.
I need to know which document out of my 10000 documents has no searchable text.
How can I know this without opening each document by itself to check?

try67
Community Expert
Community Expert
December 21, 2023

The two things are the same... If a document has more than zero words in it, then it has searchable text, and vice versa.

To identify that you would need to use a script. You can run it as a part of an Action (via Tools - Action Wizard) to generate a list of all the files that don't have any searchable text in them.

You can use the following code for that:

 

 

var numWords = 0;
for (var p=0; p<this.numPages; p++) {
	var numWords+=this.getPageNumWords(p);
	if (numWords>0) break;
}
if (numWords==0) console.println(this.path);

 

 

When the Action finishes executing press Ctrl+J and you'll see a list of all the file paths without any text in them from those you scanned.

 

Edited: Fixed small mistake in the code


PS. You would have to have Acrobat Pro to do that. You can't do it with the free Reader.