• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Batch process to check hundreds of pdfs to see if they are not "image-only" PDFs

New Here ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

How can I check all pdfs in several directories/folders to ensure that they are not just "scanned or image-only" pdfs and if any are found, create a report (html file or a text file) with the results.

I understand how to do this with the pdf that I open in Adobe DC but no idea how to do this with actions/scripting etc.

TOPICS
How to , Standards and accessibility

Views

1.7K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Apr 03, 2021 Apr 03, 2021

You can do it using JavaScript by counting the number of words in the file. If it's zero, it means it hasn't been OCRed.

The easiest way to output these results is to the JS Console, which you can then open at the end of the process to see the names of the files that were detected.

You can do that by having your Action execute this code:

 

var totalNumWords = 0;
for (var p=0; p<this.numPages; p++) {
	var numWords = this.getPageNumWords(p);
	totalNumWords+=numWords;
	if (totalNumWords>0) break;
}
...

Votes

Translate

Translate
Community Expert ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

You can do it using JavaScript by counting the number of words in the file. If it's zero, it means it hasn't been OCRed.

The easiest way to output these results is to the JS Console, which you can then open at the end of the process to see the names of the files that were detected.

You can do that by having your Action execute this code:

 

var totalNumWords = 0;
for (var p=0; p<this.numPages; p++) {
	var numWords = this.getPageNumWords(p);
	totalNumWords+=numWords;
	if (totalNumWords>0) break;
}
if (totalNumWords==0) console.println(this.path);

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

Thank you.....I guess what I am trying to do can be done, now just need to understand how to actually do it with the information you provided (never created an action before or even used javascript in Adobe DC).

 

Is there anyplace that would explain (sort of a step by step) how to do something like you recommended?

 

BTW - thank you for replying.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

See: https://helpx.adobe.com/acrobat/using/action-wizard-acrobat-pro.html

If you need more specific help with this, post here again.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

I sure seem to be stumbling along with this....

I found this info that sort of made sense to me and end result is that it says zero words found for the one pdf that I opened (correct result)

http://phlogtastic.blogspot.com/2016/08/count-number-of-words-using-javascript.html

 

I then basically did the same thing as above but used your script instead. Now stumped on where/how to see the results for the 3 files that I have in the same directory (for testing purposes, I have 2 pdfs that should have no words and one pdf that does have words)?

 

 

 

 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

You need to create a new Action via Tools - Action Wizard, add to it an Execute JavaScript command and then paste the code into that command.

Untick the "Prompt User" check-box under this command, save the Action, and then run it on your files or folder.

When it's done press Ctrl-J to open the JS Console and the output (if any) will be visible there.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

Wow....and thank you. It works just like you said. 

 

One last beginner question....how would I modify the javascript so that it only searches pdf files (we have lots of other file types in the various folders/directors) and excludes all other file types/extensions? 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

Unfortunately, that's not possible. It used to be possible to select which file-types an Action will process, but not any longer. So any non-PDF file you select will be automatically converted to PDF by the Action, before executing the rest of the commands on it.

You will need to select only the PDF files in your folders if you want to process just them, or copy them to a different location and process that.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

Thanks for all the info and for the heads-up about any non-pdf files being automatically converted to pdf.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

Not sure how to do an edit on above - just wanted to add that when I pick the pdfs via adding files, I can do a search for *.pdf and then add then (solution to not worrying about other files types). This seems to work also.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

Sure, but the benefit of selecting a folder is that it automatically processes all the sub-folders under it, too, and that you can't do when selecting individual files.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

I think if I am at the root level and then do a search for all pdfs (*.pdf), it seems to just list them all in Windows Explorer and then I can select them all (regardless all the folder they are in).....could be wrong though and will need to test again to be sure.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 03, 2021 Apr 03, 2021

Copy link to clipboard

Copied

LATEST

Windows Explorer only shows the contents of one folder at a time, but a search can be used to locate multiple files. I guess you can search for just PDF files and then drag them into the file selection section of the Action, though. That should work...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines