Search for missing word on a page within a document

Community Beginner ,
Nov 21, 2019

Copy link to clipboard

Copied

Hi all,  I'd like to know if there is a way to script or an existing method to find pages in a document that do not contain a certain word or phrase.  I have a document that is 400 pages and each one should contain at least one instance of a specific word or phrase based on user prompt.  Ideally the output would be a simple list of page numbers that are missing this word.  Thanks in advance for any assistance.

Most Valuable Participant
Correct answer by Test Screen Name | Most Valuable Participant

Ok, you may or may not find this fairly advanced. The method document.getPageNthWord is the root of all text extraction and searching in JavaScript. You would step through your pages, and look at each word in turn (in a loop). You can do whatever tests you like, such as "string does not match any of the words on a page", and take the action you want. Making output is also something of a challenge because of strong limits on what JavaScript can do, for security reasons.

TOPICS
Acrobat SDK and JavaScript, How to

Views

98

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Search for missing word on a page within a document

Community Beginner ,
Nov 21, 2019

Copy link to clipboard

Copied

Hi all,  I'd like to know if there is a way to script or an existing method to find pages in a document that do not contain a certain word or phrase.  I have a document that is 400 pages and each one should contain at least one instance of a specific word or phrase based on user prompt.  Ideally the output would be a simple list of page numbers that are missing this word.  Thanks in advance for any assistance.

Most Valuable Participant
Correct answer by Test Screen Name | Most Valuable Participant

Ok, you may or may not find this fairly advanced. The method document.getPageNthWord is the root of all text extraction and searching in JavaScript. You would step through your pages, and look at each word in turn (in a loop). You can do whatever tests you like, such as "string does not match any of the words on a page", and take the action you want. Making output is also something of a challenge because of strong limits on what JavaScript can do, for security reasons.

TOPICS
Acrobat SDK and JavaScript, How to

Views

99

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Nov 21, 2019 0
Most Valuable Participant ,
Nov 21, 2019

Copy link to clipboard

Copied

Are you a JavaScript programmer?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 21, 2019 0
Community Beginner ,
Nov 21, 2019

Copy link to clipboard

Copied

In process of learning JavaScript and how it can be integrated within an Action Wizard

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 21, 2019 0
Most Valuable Participant ,
Nov 21, 2019

Copy link to clipboard

Copied

Ok, you may or may not find this fairly advanced. The method document.getPageNthWord is the root of all text extraction and searching in JavaScript. You would step through your pages, and look at each word in turn (in a loop). You can do whatever tests you like, such as "string does not match any of the words on a page", and take the action you want. Making output is also something of a challenge because of strong limits on what JavaScript can do, for security reasons.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 21, 2019 1
Most Valuable Participant ,
Nov 21, 2019

Copy link to clipboard

Copied

In Theory, this is possible with a script, although 400 pages is pushing the limit of what a script in Acrobat can handle, from my experience. The alternative is to use a stand-alone tool, which is more complicated to develop, but much more robust.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 21, 2019 1
Community Beginner ,
Nov 27, 2019

Copy link to clipboard

Copied

Thanks for the advice and input - greatly appreciated!  Sorry for the questions but the first time I've seen JavaScript was last week;)    A followup question:  how does the code receive the variable input from the User Prompt in the first step of the Adobe Action (Search & Remove Text)?  Here is the start of my initial code which will be under the Execute JavaScript step:

 

// Looks over all pages and find a given string and  

// displays page numbers that do not have this string

 

var stringToFind = <how does the variable link here from user prompt in the Action?>

 

for (var p = 0; p < this.numPages; p++) {

     // iterate over all words

     for (var n = 0; n < this.getPageNumWords(p); n++) {

           if (this.getPageNthWord(p, n) != stringToFind)

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 27, 2019 0
try67 LATEST
Most Valuable Participant ,
Nov 27, 2019

Copy link to clipboard

Copied

It doesn't. If you use the Search & Remove Text command then your approach needs to be completely different.

That command creates Redaction annotations over the matching it terms. You need to then look for those annotations in your script and based on their locations you could find out which pages don't containt the text you searched for.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 27, 2019 0