Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Extracting Pages Based on Matching Strings in Acrobat Pro DC Java Script

New Here ,
Dec 23, 2020 Dec 23, 2020

I need to extract pages from a PDF document with matching strings i.e. Acrobat create a new file of all pages where it finds strings that I have in CSV or xlsx file

This is a sample PDF file from which I only need pages having following two strings...

  1. macros
  2. salesperson

I found following code here while googling around but it searches only one string and creates a new file of pages matching that string. While I need to search multiple strings and need only one file. Any ideas please...

 

// Iterates over all pages and find a given string and extracts all 
// pages on which that string is found to a new file.

var pageArray = [];

var stringToSearchFor = "Test";

for (var p = 0; p < this.numPages; p++) {
    // iterate over all words
    for (var n = 0; n < this.getPageNumWords(p); n++) {
        if (this.getPageNthWord(p, n) == stringToSearchFor) {
            pageArray.push(p);
            break;
        }
    }
}

if (pageArray.length > 0) {
    // extract all pages that contain the string into a new document
    var d = app.newDoc();    // this will add a blank page - we need to remove that once we are done
    for (var n = 0; n < pageArray.length; n++) {
        d.insertPages( {
            nPage: d.numPages-1,
            cPath: this.path,
            nStart: pageArray[n],
            nEnd: pageArray[n],
        } );
    }

    // remove the first page
    d.deletePages(0);
    
}
 

I assume that some code will be added to load CSV/XLSX file and a FOR/WHILE loop to search all strings in that PDF file and storing their page numbers and then creating a new file with all these page numbers.

TOPICS
How to , JavaScript
1.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 23, 2020 Dec 23, 2020

This is possible, but you can't read the contents of an XLSX file with a script. You can read a CSV file, though.

You can use the readFileIntoStream method of the util object to do that. Then parse the contents of the file and convert to an array of search terms, and add another loop (or use the indexOf method of the array object) to match each word against this array of search terms. The rest of the code can stay pretty much the same.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 31, 2020 Dec 31, 2020

Hi @try67

Thanks for your reply. I know the logic behind it but dont know JS to make this logic work. But still I found a solution while googling around.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 31, 2020 Dec 31, 2020
LATEST

Hi Allz,

I found a solution to this here. Here I downloaded this Adobe Action file which do what I described above. It do one thing extra which is highlighting the text that it find in file which do not bother me.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines