Skip to main content
Participating Frequently
April 3, 2021
해결됨

Batch process to check hundreds of pdfs to see if they are not "image-only" PDFs

  • April 3, 2021
  • 1 답변
  • 3514 조회

How can I check all pdfs in several directories/folders to ensure that they are not just "scanned or image-only" pdfs and if any are found, create a report (html file or a text file) with the results.

I understand how to do this with the pdf that I open in Adobe DC but no idea how to do this with actions/scripting etc.

이 주제는 답변이 닫혔습니다.
최고의 답변: try67

You can do it using JavaScript by counting the number of words in the file. If it's zero, it means it hasn't been OCRed.

The easiest way to output these results is to the JS Console, which you can then open at the end of the process to see the names of the files that were detected.

You can do that by having your Action execute this code:

 

var totalNumWords = 0;
for (var p=0; p<this.numPages; p++) {
	var numWords = this.getPageNumWords(p);
	totalNumWords+=numWords;
	if (totalNumWords>0) break;
}
if (totalNumWords==0) console.println(this.path);

1 답변

try67
Community Expert
try67Community Expert답변
Community Expert
April 3, 2021

You can do it using JavaScript by counting the number of words in the file. If it's zero, it means it hasn't been OCRed.

The easiest way to output these results is to the JS Console, which you can then open at the end of the process to see the names of the files that were detected.

You can do that by having your Action execute this code:

 

var totalNumWords = 0;
for (var p=0; p<this.numPages; p++) {
	var numWords = this.getPageNumWords(p);
	totalNumWords+=numWords;
	if (totalNumWords>0) break;
}
if (totalNumWords==0) console.println(this.path);
Jerry5EFC작성자
Participating Frequently
April 3, 2021

Thank you.....I guess what I am trying to do can be done, now just need to understand how to actually do it with the information you provided (never created an action before or even used javascript in Adobe DC).

 

Is there anyplace that would explain (sort of a step by step) how to do something like you recommended?

 

BTW - thank you for replying.

Jerry5EFC작성자
Participating Frequently
April 3, 2021

See: https://helpx.adobe.com/acrobat/using/action-wizard-acrobat-pro.html

If you need more specific help with this, post here again.


I sure seem to be stumbling along with this....

I found this info that sort of made sense to me and end result is that it says zero words found for the one pdf that I opened (correct result)

http://phlogtastic.blogspot.com/2016/08/count-number-of-words-using-javascript.html

 

I then basically did the same thing as above but used your script instead. Now stumped on where/how to see the results for the 3 files that I have in the same directory (for testing purposes, I have 2 pdfs that should have no words and one pdf that does have words)?