Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
3

Word Count

New Here ,
Dec 05, 2012 Dec 05, 2012

How do I count words in an English PDF document?

229.8K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 05, 2012 Dec 05, 2012

There's no built-in word count tool. In Adobe Acrobat you can use a console JavaScript:

var cnt=0;

for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);

console.println("There are " + cnt + " words in this file.");

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 18, 2013 Nov 18, 2013

When i use the script :

var cnt=0;

for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);

app.alert("There are " + cnt + " words in this file.");

js.png

I keep getting word count 0 on all PDFs?

Any ideas?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 18, 2013 Nov 18, 2013

The document is probably scanned and was not OCR-ed, so it doesn't contain

any actual words in it, just images with text on them...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 19, 2013 Nov 19, 2013

The PDF is full text, when i copy and paste to word the count is 1,052 words - Im just wondering whether i need to edit the script at all?

Script i am using -

var cnt=0;

for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);

app.alert("There are " + cnt + " words in this file.");

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 19, 2013 Nov 19, 2013

No, that script comes from the JS API Reference and should work. Can you

share the file?

On Tue, Nov 19, 2013 at 10:15 AM, tobywilmington

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 21, 2013 Nov 21, 2013

Hi Gilad,

It is happeneing with any PDF, ive just tried the ipad mannual - http://manuals.info.apple.com/MANUALS/1000/MA1595/en_US/ipad_user_guide.pdf

I have this now set up so when i open a pdf it runs the Java automatically but just doesnt seem to pick up words?

Cheers

Toby

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

What do you mean, exactly? From where are you running this code?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 21, 2013 Nov 21, 2013

In Adobe Acrobat pro - im using this JS as a saved script, i think it runs through the debugger?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

So you're running it from the console directly?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

Also, are you selecting all of the code when you run it?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Sep 12, 2016 Sep 12, 2016

I'm attempting to use the same JS code in Adobe Pro version 10 and receive the "no words found" response. What and how did you finally resolve your issues? If you don't mind sharing....

Thanx

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 12, 2016 Sep 12, 2016

That usually means that your file contains only images, no real text...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

I ran it on the iPad Manual file and the result was:

Snap1.png

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 21, 2013 Nov 21, 2013

When i run it directly in the console - the console replies as below

Sorry to be a pain!

Untitled.png

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

Yes, that's what I thought... You have to select all of the code (with the mouse) before running it.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 21, 2013 Nov 21, 2013

So sorry! Rookie!

Massive thanks

Out of interest - does this script have the potential to count images? that would be a huge help to myself

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

It's a common mistake to make...

No, JS has no access to the images in the file, at all. Only to the textual content, and even that's limited.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Sep 15, 2022 Sep 15, 2022

anyone know why the Acrobat console reports such wildly different word counts than other tools (e.g. Word)?

What is this script counting in addition to word breaks? I get differences of a couple hundred to over a thousand extra "words."

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2022 Sep 15, 2022

For example, Acrobat splits hyphenated words into two, so "right-handed" will count as two words, while Word counts it as one. If you can share a sample file that demonstrates this issue we can look more closely into it, but if Word behaves the way you're looking for, just export the PDF to Word and do the count there. That's the easiest solution.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Sep 15, 2022 Sep 15, 2022

try this PDF. 

40,911 words per Acrobat

39,735 copy/pasted into Word

 

I'm very interested to find out where the discrepancy comes from, because then it can be corrected against.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Sep 15, 2022 Sep 15, 2022

initially I thought it might be the metadata. The metadata of the above PDF, saved from Acrobat, comes to 37 "words" in Word. It's 157 copy/pasted from Word into Acrobat.

 

Big difference, but nowhere near the total difference.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2022 Sep 15, 2022

Metadata info is not included in the word count.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2022 Sep 15, 2022

Not the easiest file to work with... More than 12K words just on page 1! Can you find a smaller file that demonstrates this issue? Also, notice the page is cut off at the end of the page. The last line is duplicated on both pages, which might help explain the differences you're getting in the counts.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Sep 16, 2022 Sep 16, 2022

We work with full-length novels. The word count is not unusual. I grabbed that file because it's in the public domain. I cannot share the full text of the books we work with.

 

I obtained that file from the Project Guttenberg website. It was HTML text that I saved as a PDF. I took off the headers and footers as they weren't part of the actual text. Whatever text is visible in the PDF, I did Select All > Copy, then Paste into Word, so the text is the same (or should be, let me know if you want the Word file that was created).

 

To your point, you could take _any_ PDF upwards of 1,000 words, run the word count in Acrobat, then paste the text into Word and run a word count there.

 

In my two examples — 1,000 words vs 46,000 words — the higher word count title resulted in larger number of "extra" words, though it was a smaller percentage of the total; while a smaller word count had a fewer number of "extra" words (naturally) though a higher percentage difference of the total.

 

Specifically: 1,000 words in Word vs 1,200 in Acrobat = 200 extra words or 20% variation; vs 46,000/47,000 or 1,000 extra words with only 2% variation.

 

The only reason I included the stats about the metadata is because the results are so wildly different: 37 vs 157! That's not a lot of text for such a large variation.

 

Perhaps that could be a clue as to what it is about the text that's causing it to be read so differently.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines