• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
11

Write to me which document does not have optical character recognition (OCR).

Community Beginner ,
Dec 21, 2023 Dec 21, 2023

Copy link to clipboard

Copied

I have thousands of scanned documents, and I need to know which ones have optical character recognition (OCR) and which ones don't.
I don't have the ability to open each document by itself to check.
I have ABBYY FineReader 16
Apparently there is such an option, but I couldn't find where to do it.
Is there any way to check all my scanned documents and tell me which ones do not have OCR?
Or is there any such software in the world at all?

TOPICS
Edit and convert PDFs , Modern Acrobat , PDF forms

Views

650

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2023 Dec 21, 2023

Copy link to clipboard

Copied

With Adobe Acrobat you can check for searchable text. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Dec 21, 2023 Dec 21, 2023

Copy link to clipboard

Copied

Maybe I didn't explain myself correctly I know that I can tell with Adobe Acrobat that I have searchable texts.
But I need something completely different.
I need to know which document out of my 10000 documents has no searchable text.
How can I know this without opening each document by itself to check?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2023 Dec 21, 2023

Copy link to clipboard

Copied

You get also the documents which doesn't have searchable text. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2023 Dec 21, 2023

Copy link to clipboard

Copied

The two things are the same... If a document has more than zero words in it, then it has searchable text, and vice versa.

To identify that you would need to use a script. You can run it as a part of an Action (via Tools - Action Wizard) to generate a list of all the files that don't have any searchable text in them.

You can use the following code for that:

 

 

var numWords = 0;
for (var p=0; p<this.numPages; p++) {
	var numWords+=this.getPageNumWords(p);
	if (numWords>0) break;
}
if (numWords==0) console.println(this.path);

 

 

When the Action finishes executing press Ctrl+J and you'll see a list of all the file paths without any text in them from those you scanned.

 

Edited: Fixed small mistake in the code

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 21, 2023 Dec 21, 2023

Copy link to clipboard

Copied

PS. You would have to have Acrobat Pro to do that. You can't do it with the free Reader.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Dec 22, 2023 Dec 22, 2023

Copy link to clipboard

Copied

Thank you very much for trying to help me.
Perhaps additional clarification is needed

I have Acrobat Pro but it didn't work.
This is not done automatically
But at the end of scanning each file you have to order to continue.
And at the end of the whole operation, I got nothing with Ctrl+J

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 22, 2023 Dec 22, 2023

Copy link to clipboard

Copied

LATEST

> I have Acrobat Pro but it didn't work.

- Didn't work in what way, exactly?

 

> This is not done automatically

- No, you have to run it manually, but it will process all the files you selected one after another.

Acrobat is not built for anything else. You would need a stand-alone application for a completely automated solution. Such tools are also more robust than Acrobat and can process more files (without displaying them), much faster.

 

> But at the end of scanning each file you have to order to continue.

- Make sure to untick the "Prompt User" check-box under the "Execute JavaScript" command in the Action.

 

> And at the end of the whole operation, I got nothing with Ctrl+J

- Then all of your files have (some) text in them.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 22, 2023 Dec 22, 2023

Copy link to clipboard

Copied

quoteI have ABBYY FineReader 16
Apparently there is such an option, but I couldn't find where to do it.

By @arye anak

Ask in an ABBYY forum. There is no metadata flag in aPDF file that conveys such an information.

ABAMBO | Hard- and Software Engineer | Photographer

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines