Copy link to clipboard
Copied
Hello. I found a script that extracts pages based on content. I am trying to extract pages based on "Page 1 of 1" and "Page 1 of 2 & Page 2 of 2". I cannot figure out what to put in the search line. (“page”, “1”, “of”, and “1”) doesn't work. Any help would be appreciated. I really don't have much programming experience. I'm researching Javascript documentation, but it's really not much help. I'm so close...
// Iterates over all pages and find a given string and extracts all
// pages on which that string is found to a new file.
var pageArray = [];
var stringToSearchFor = "page\s1\sof\s1";
for (var p = 0; p < this.numPages; p++) {
// iterate over all words
for (var n = 0; n < this.getPageNumWords(p); n++) {
if (this.getPageNthWord(p, n) == stringToSearchFor) {
pageArray.push(p);
break;
}
}
}
if (pageArray.length > 0) {
// extract all pages that contain the string into a new document
var d = app.newDoc(); // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageArray.length; n++) {
d.insertPages( {
nPage: d.numPages-1,
cPath: this.path,
nStart: pageArray
nEnd: pageArray
} );
}
// remove the first page
d.deletePages(0);
}
Copy link to clipboard
Copied
Have you tired to see what your search script is finding and testing by adding a "console.println" to display the word found as the script searches the page?
As I understand "this.getPageNthWord(p, n)" returns the "n"th word on the "p" page. It appears you are looking for the four words "Page", "1", "of", "1". In my experience you need to search for all for words including the 3 word separating spaces between the words. Please review the Acrobat JavaScript documentation for "getPageNthWord" method.
Copy link to clipboard
Copied
// Iterates over all pages and find a given string and extracts all
// pages on which that string is found to a new file.
var pageArray = [];
var stringToSearchFor = "page\s1\sof\s1";
for (var p = 0; p < this.numPages; p++) {
// iterate over all words
for (var n = 0; n < this.getPageNumWords(p); n++) {
if (this.getPageNthWord(p, n) == stringToSearchFor) {
pageArray.push(p);
break;
}
}
}
console.println
if (pageArray.length > 0) {
// extract all pages that contain the string into a new document
var d = app.newDoc(); // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageArray.length; n++) {
d.insertPages( {
nPage: d.numPages-1,
cPath: this.path,
nStart: pageArray
nEnd: pageArray
} );
}
// remove the first page
d.deletePages(0);
}
Was the that okay for inserting console.println?
Copy link to clipboard
Copied
I would have added the statement during the first loop to display the each word that was found in the document for the comparison to the string of words "page 1 of 1"..
console.clear();
var pageArray = [];
var stringToSearchFor = "page\s1\sof\s1";
for (var p = 0; p < this.numPages; p++) {
// iterate over all words
for (var n = 0; n < this.getPageNumWords(p); n++) {
console.println("Page: " + p + " word " + n + " is " + this.getPageNthWord(p, n));
if (this.getPageNthWord(p, n) == stringToSearchFor) {
console.println("Match found");
pageArray.push(p);
break;
}
}
}
Now my results list one word at time, so no one word will match your string of 4 words.
You need to make a string of 4 words in a row including the word separator between the first 3 words for the comparison to work.
Copy link to clipboard
Copied
GKaiseril is correct, The function that acquires page text only returns one word at a time. If you want to detect phrases you'll need to collect all the words on a page into a single string and search it for the phrase.
Or, a much simpler and more efficient solution is to use the Redact find tool to mark the phrases with a redact annotation. Then extract pages that contain the annots, and then delete the annots.
In fact, I created exactly this type of solution for the free search and highlight Action here:
https://acrobatusers.com/actions-exchange
Also on this page you'll find the "Extract Commented Pages" Action. If you run these two Actions back to back, you get exactly what you want. And if you can program, then you can extract and combine the scripts into a single tool.