Skip to main content
Participant
December 27, 2020
Question

Extracting Pages Based on Matching Strings in Acrobat Pro DC Java Script

  • December 27, 2020
  • 1 reply
  • 2500 views

I need to extract pages from a PDF document with matching strings i.e. Acrobat create a new file of all pages where it finds strings that I have in CSV or xlsx file

I only need pages having the following two strings...

macros
salesperson
I found following code while googling around but it searches only one string and creates a new file of pages matching that string. While I need to search multiple strings and need only one file. Any ideas please...

// Iterates over all pages and find a given string and extracts all
// pages on which that string is found to a new file.

var pageArray = [];

var stringToSearchFor = "Test";

for (var p = 0; p < this.numPages; p++) {
// iterate over all words
for (var n = 0; n < this.getPageNumWords(p); n++) {
if (this.getPageNthWord(p, n) == stringToSearchFor) {
pageArray.push(p);
break;
}
}
}

if (pageArray.length > 0) {
// extract all pages that contain the string into a new document
var d = app.newDoc(); // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageArray.length; n++) {
d.insertPages( {
nPage: d.numPages-1,
cPath: this.path,
nStart: pageArray[n],
nEnd: pageArray[n],
} );
}

// remove the first page
d.deletePages(0);

}

 

I assume that some code will be added to load CSV/XLSX file and a FOR/WHILE loop to search all strings in that PDF file and storing their page numbers and then creating a new file with all these page numbers.

This topic has been closed for replies.

1 reply

try67
Community Expert
Community Expert
December 27, 2020

Change this line

var stringToSearchFor = "Test";

To:

var stringsToSearchFor = ["macros", "salesperson"];

 

And this line:

if (this.getPageNthWord(p, n) == stringToSearchFor) {

To:

if (stringsToSearchFor.indexOf(this.getPageNthWord(p, n)!=-1) {