Copy link to clipboard
Copied
How do I count words in an English PDF document?
Copy link to clipboard
Copied
There's no built-in word count tool. In Adobe Acrobat you can use a console JavaScript:
var cnt=0;
for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);
console.println("There are " + cnt + " words in this file.");
Copy link to clipboard
Copied
When i use the script :
var cnt=0;
for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);
app.alert("There are " + cnt + " words in this file.");
I keep getting word count 0 on all PDFs?
Any ideas?
Copy link to clipboard
Copied
The document is probably scanned and was not OCR-ed, so it doesn't contain
any actual words in it, just images with text on them...
Copy link to clipboard
Copied
The PDF is full text, when i copy and paste to word the count is 1,052 words - Im just wondering whether i need to edit the script at all?
Script i am using -
var cnt=0;
for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);
app.alert("There are " + cnt + " words in this file.");
Copy link to clipboard
Copied
No, that script comes from the JS API Reference and should work. Can you
share the file?
On Tue, Nov 19, 2013 at 10:15 AM, tobywilmington
Copy link to clipboard
Copied
Hi Gilad,
It is happeneing with any PDF, ive just tried the ipad mannual - http://manuals.info.apple.com/MANUALS/1000/MA1595/en_US/ipad_user_guide.pdf
I have this now set up so when i open a pdf it runs the Java automatically but just doesnt seem to pick up words?
Cheers
Toby
Copy link to clipboard
Copied
What do you mean, exactly? From where are you running this code?
Copy link to clipboard
Copied
In Adobe Acrobat pro - im using this JS as a saved script, i think it runs through the debugger?
Copy link to clipboard
Copied
So you're running it from the console directly?
Copy link to clipboard
Copied
Also, are you selecting all of the code when you run it?
Copy link to clipboard
Copied
I'm attempting to use the same JS code in Adobe Pro version 10 and receive the "no words found" response. What and how did you finally resolve your issues? If you don't mind sharing....
Thanx
Copy link to clipboard
Copied
That usually means that your file contains only images, no real text...
Copy link to clipboard
Copied
I ran it on the iPad Manual file and the result was:
Copy link to clipboard
Copied
When i run it directly in the console - the console replies as below
Sorry to be a pain!
Copy link to clipboard
Copied
Yes, that's what I thought... You have to select all of the code (with the mouse) before running it.
Copy link to clipboard
Copied
So sorry! Rookie!
Massive thanks
Out of interest - does this script have the potential to count images? that would be a huge help to myself
Copy link to clipboard
Copied
It's a common mistake to make...
No, JS has no access to the images in the file, at all. Only to the textual content, and even that's limited.
Copy link to clipboard
Copied
Or, you could copy/paste into a Word document and do the word count there.
Copy link to clipboard
Copied
You can also create an Action Wizard with Dave's script and by changing "console.printIn" to "app.alert" a dialog box will pop up with the page count.
Copy link to clipboard
Copied
Actions are not available in Reader, only in Acrobat Pro.
And using an alert in an Action is counter-productive as it means you have to sit in front of it, clicking OK, OK, OK...
It's better to print out the result to the console, preceded by the name of the file that's being processed.
Then when the process is done you open the console and see all the results in one glance.
Copy link to clipboard
Copied
On Windows platform you can use AnyCount Word Count and Character Count Software
It supports word count in 38 formats including PDF. Actually, it can count even on PC without Adobe Acrobat installed.
Copy link to clipboard
Copied
I have used the below JavaScript, and it works well. But does anyone know, if it is possible to make a JavaScript that instead of counting words, counts characters including spaces?
var cnt=0;
for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);
console.println("There are " + cnt + " words in this file.");
Copy link to clipboard
Copied
You could write a script that looped through the words, got the number of characters in each one, and added together. You can't get spaces or punctuation, the spaces often aren't even there (just gaps). You could add 1 for each word, which would be approximate but not (for example) a solid basis for payment.
Copy link to clipboard
Copied
Actually, you can get spaces and punctuation by setting the third parameter (bStrip) of getPageNthWord as false.
This code will count all the characters in a document:
var charCount = 0;
for (var p=0; p<this.numPages; p++) {
var numWords = this.getPageNumWords(p);
for (var i=0; i<numWords; i++) {
var word = this.getPageNthWord(p,i,false);
charCount+=word.length;
}
}
app.alert("There are " + charCount + " characters in this file.",3);