Skip to main content
September 16, 2016
Question

Extract certain pages from a document based on key words

  • September 16, 2016
  • 1 reply
  • 966 views

Hi everyone,

I am trying to extract pages from a large document based on certain keywords. So if a keyword is found on one specific page, then that page number is pushed to an array, and used to create a new document. However, the issue I am having is with my script, it seems to be very inconsistent and cannot seem to create multiple new documents. Please note - almost all of this script I found online that someone else had made, and I am trying to adapt it to my purposes.

// Iterates over all pages and find a given string and extracts all

// pages on which that string is found to a new file.

var pageArray = [];

var pageA = [];

var stringToSearchFor = "keyword1";

var stringToSearch = "keyword2";

for (var p = 0; p < this.numPages; p++) {

  // iterate over all words

  for (var n = 0; n < this.getPageNumWords(p); n++) {

  if (this.getPageNthWord(p, n) == stringToSearchFor) {

  pageArray.push(p);

  break;

  }

        else if (this.getPageNthWord(p,n) == stringToSearch) {

            pageA.push(p);

            break;

     }

    }

}

console.println("Test 2 of pageArray " + pageArray);

if (pageArray.length > 0) {

  // extract all pages that contain the string into a new document

  var d = app.newDoc();    // this will add a blank page - we need to remove that once we are done

  for (var n = 0; n < pageArray.length; n++) {

  d.insertPages( {

  nPage: d.numPages-1,

  cPath: this.path,

  nStart: pageArray,

  nEnd: pageArray,

  } );

       console.println(n + " pageArray " + pageArray) }

    // remove the first page

    d.deletePages(0);

   

}

if (pageA.length > 0) {

  // extract all pages that contain the string into a new document

  var q = app.newDoc();    // this will add a blank page - we need to remove that once we are done

  for (var n = 0; n < pageA.length; n++) {

  q.insertPages( {

  nPage: q.numPages-1,

  cPath: this.path,

  nStart: pageA,

  nEnd: pageA,

  } );

        console.println(n + " pageA " + pageA)

}

console.println(pageA)

    // remove the first page

   

}

Thanks!

-Forrest

This topic has been closed for replies.

1 reply

try67
Adobe Expert
September 16, 2016

Is the issue that some pages that contain both words only appear in one of the final files?

By the way, you're missing the command to delete the first page of the second file, after generating it.

September 16, 2016

Thanks for the quick response - Unfortunately no. I am using this script as part of a way to sort invoices, so the keyword I am searching for is the vendor's name - so two vendor's names will not appear on the same page.

And thanks for pointing that out - I had done that as a trouble shooting mechanism. Oddly enough the script seems to work for certain words but not others, even though I can find both words by searching (cmd + f) the document. Very confusing.

September 16, 2016

I should also point out that I put the console.println() to check that the arrays have values, which both of them do. So I think the issue may have something to do with the newDoc creation?