• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
2

Word Count

New Here ,
Dec 05, 2012 Dec 05, 2012

Copy link to clipboard

Copied

How do I count words in an English PDF document?

Views

225.4K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Dec 05, 2012 Dec 05, 2012

Copy link to clipboard

Copied

There's no built-in word count tool. In Adobe Acrobat you can use a console JavaScript:

var cnt=0;

for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);

console.println("There are " + cnt + " words in this file.");

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 18, 2013 Nov 18, 2013

Copy link to clipboard

Copied

When i use the script :

var cnt=0;

for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);

app.alert("There are " + cnt + " words in this file.");

js.png

I keep getting word count 0 on all PDFs?

Any ideas?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 18, 2013 Nov 18, 2013

Copy link to clipboard

Copied

The document is probably scanned and was not OCR-ed, so it doesn't contain

any actual words in it, just images with text on them...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 19, 2013 Nov 19, 2013

Copy link to clipboard

Copied

The PDF is full text, when i copy and paste to word the count is 1,052 words - Im just wondering whether i need to edit the script at all?

Script i am using -

var cnt=0;

for (var p = 0; p < this.numPages; p++) cnt += getPageNumWords(p);

app.alert("There are " + cnt + " words in this file.");

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 19, 2013 Nov 19, 2013

Copy link to clipboard

Copied

No, that script comes from the JS API Reference and should work. Can you

share the file?

On Tue, Nov 19, 2013 at 10:15 AM, tobywilmington

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

Hi Gilad,

It is happeneing with any PDF, ive just tried the ipad mannual - http://manuals.info.apple.com/MANUALS/1000/MA1595/en_US/ipad_user_guide.pdf

I have this now set up so when i open a pdf it runs the Java automatically but just doesnt seem to pick up words?

Cheers

Toby

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

What do you mean, exactly? From where are you running this code?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

In Adobe Acrobat pro - im using this JS as a saved script, i think it runs through the debugger?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

So you're running it from the console directly?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

Also, are you selecting all of the code when you run it?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Sep 12, 2016 Sep 12, 2016

Copy link to clipboard

Copied

I'm attempting to use the same JS code in Adobe Pro version 10 and receive the "no words found" response. What and how did you finally resolve your issues? If you don't mind sharing....

Thanx

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 12, 2016 Sep 12, 2016

Copy link to clipboard

Copied

That usually means that your file contains only images, no real text...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

I ran it on the iPad Manual file and the result was:

Snap1.png

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

When i run it directly in the console - the console replies as below

Sorry to be a pain!

Untitled.png

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

Yes, that's what I thought... You have to select all of the code (with the mouse) before running it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

So sorry! Rookie!

Massive thanks

Out of interest - does this script have the potential to count images? that would be a huge help to myself

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 21, 2013 Nov 21, 2013

Copy link to clipboard

Copied

It's a common mistake to make...

No, JS has no access to the images in the file, at all. Only to the textual content, and even that's limited.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Sep 15, 2022 Sep 15, 2022

Copy link to clipboard

Copied

anyone know why the Acrobat console reports such wildly different word counts than other tools (e.g. Word)?

What is this script counting in addition to word breaks? I get differences of a couple hundred to over a thousand extra "words."

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2022 Sep 15, 2022

Copy link to clipboard

Copied

For example, Acrobat splits hyphenated words into two, so "right-handed" will count as two words, while Word counts it as one. If you can share a sample file that demonstrates this issue we can look more closely into it, but if Word behaves the way you're looking for, just export the PDF to Word and do the count there. That's the easiest solution.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Sep 15, 2022 Sep 15, 2022

Copy link to clipboard

Copied

try this PDF. 

40,911 words per Acrobat

39,735 copy/pasted into Word

 

I'm very interested to find out where the discrepancy comes from, because then it can be corrected against.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Sep 15, 2022 Sep 15, 2022

Copy link to clipboard

Copied

initially I thought it might be the metadata. The metadata of the above PDF, saved from Acrobat, comes to 37 "words" in Word. It's 157 copy/pasted from Word into Acrobat.

 

Big difference, but nowhere near the total difference.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2022 Sep 15, 2022

Copy link to clipboard

Copied

Metadata info is not included in the word count.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 15, 2022 Sep 15, 2022

Copy link to clipboard

Copied

Not the easiest file to work with... More than 12K words just on page 1! Can you find a smaller file that demonstrates this issue? Also, notice the page is cut off at the end of the page. The last line is duplicated on both pages, which might help explain the differences you're getting in the counts.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Sep 16, 2022 Sep 16, 2022

Copy link to clipboard

Copied

We work with full-length novels. The word count is not unusual. I grabbed that file because it's in the public domain. I cannot share the full text of the books we work with.

 

I obtained that file from the Project Guttenberg website. It was HTML text that I saved as a PDF. I took off the headers and footers as they weren't part of the actual text. Whatever text is visible in the PDF, I did Select All > Copy, then Paste into Word, so the text is the same (or should be, let me know if you want the Word file that was created).

 

To your point, you could take _any_ PDF upwards of 1,000 words, run the word count in Acrobat, then paste the text into Word and run a word count there.

 

In my two examples — 1,000 words vs 46,000 words — the higher word count title resulted in larger number of "extra" words, though it was a smaller percentage of the total; while a smaller word count had a fewer number of "extra" words (naturally) though a higher percentage difference of the total.

 

Specifically: 1,000 words in Word vs 1,200 in Acrobat = 200 extra words or 20% variation; vs 46,000/47,000 or 1,000 extra words with only 2% variation.

 

The only reason I included the stats about the metadata is because the results are so wildly different: 37 vs 157! That's not a lot of text for such a large variation.

 

Perhaps that could be a clue as to what it is about the text that's causing it to be read so differently.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines