Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
11

Write to me which document does not have optical character recognition (OCR).

Community Beginner ,
Dec 21, 2023 Dec 21, 2023

I have thousands of scanned documents, and I need to know which ones have optical character recognition (OCR) and which ones don't.
I don't have the ability to open each document by itself to check.
I have ABBYY FineReader 16
Apparently there is such an option, but I couldn't find where to do it.
Is there any way to check all my scanned documents and tell me which ones do not have OCR?
Or is there any such software in the world at all?

TOPICS
Edit and convert PDFs , Modern Acrobat , PDF forms
1.1K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2023 Dec 21, 2023

With Adobe Acrobat you can check for searchable text. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Dec 21, 2023 Dec 21, 2023

Maybe I didn't explain myself correctly I know that I can tell with Adobe Acrobat that I have searchable texts.
But I need something completely different.
I need to know which document out of my 10000 documents has no searchable text.
How can I know this without opening each document by itself to check?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2023 Dec 21, 2023

You get also the documents which doesn't have searchable text. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2023 Dec 21, 2023

The two things are the same... If a document has more than zero words in it, then it has searchable text, and vice versa.

To identify that you would need to use a script. You can run it as a part of an Action (via Tools - Action Wizard) to generate a list of all the files that don't have any searchable text in them.

You can use the following code for that:

 

 

var numWords = 0;
for (var p=0; p<this.numPages; p++) {
	var numWords+=this.getPageNumWords(p);
	if (numWords>0) break;
}
if (numWords==0) console.println(this.path);

 

 

When the Action finishes executing press Ctrl+J and you'll see a list of all the file paths without any text in them from those you scanned.

 

Edited: Fixed small mistake in the code

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2023 Dec 21, 2023

PS. You would have to have Acrobat Pro to do that. You can't do it with the free Reader.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Dec 22, 2023 Dec 22, 2023

Thank you very much for trying to help me.
Perhaps additional clarification is needed

I have Acrobat Pro but it didn't work.
This is not done automatically
But at the end of scanning each file you have to order to continue.
And at the end of the whole operation, I got nothing with Ctrl+J

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 22, 2023 Dec 22, 2023
LATEST

> I have Acrobat Pro but it didn't work.

- Didn't work in what way, exactly?

 

> This is not done automatically

- No, you have to run it manually, but it will process all the files you selected one after another.

Acrobat is not built for anything else. You would need a stand-alone application for a completely automated solution. Such tools are also more robust than Acrobat and can process more files (without displaying them), much faster.

 

> But at the end of scanning each file you have to order to continue.

- Make sure to untick the "Prompt User" check-box under the "Execute JavaScript" command in the Action.

 

> And at the end of the whole operation, I got nothing with Ctrl+J

- Then all of your files have (some) text in them.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 22, 2023 Dec 22, 2023
quoteI have ABBYY FineReader 16
Apparently there is such an option, but I couldn't find where to do it.

By @arye anak

Ask in an ABBYY forum. There is no metadata flag in aPDF file that conveys such an information.

ABAMBO | Hard- and Software Engineer | Photographer
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines