• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Split PDF based on content, and save into different pdfs with custom file name

New Here ,
Jun 13, 2018 Jun 13, 2018

Copy link to clipboard

Copied

Hello. I'm a beginner in javascript, and I have adobe acrobat X pro.

I want to be able to search for a specific string within the pdf, and then save the sequence of numbers that come after that string for my file name.

For example, the text would look like "EA 224400" . I would search for the string EA, and then save 224400 as a different variable that would later be used for the file name. This sort of text is found multiple times in my pdf on different pages. Therefore, I want to be able to search for EA, save the number sequence after EA, split document at that point, saving the pages from current page (typically 5, though not always) up to page before next instance of "EA", then save an individual pdf for each number sequence using the number sequence for the pdf file name.

I found a different discussion that had almost the exact same concept that I wanted (Split large pdf on repeated text pattern, and save new pdf with custom filename ), however, the only difference is that I want to search for "EA" in the document rather than the user already knowing the location of "EA".

This is the code that I tried to use based off of the other forum, but it has a syntax error

var numSeq="";

var finalpage=0;

for (var p = 0; p < this.numPages; p++) {

    for(var n = 0; n<this.getPageNumWords(p); n++{

       if(this.getPageNthWord(p,n)=="EA"){

            numSeq=this.getPageNthWord(p,n+1)

            finalpage=p;

            break;

       }

    }

   

    for(var p2=p+1; p2<this.numPages; p2++){

        for(var n2=0; n2<this.getPageNumWords(p2); n2++){

            if(this.getPageNthWord(p2,n2)=="EA"){

                this.extractPages({

                    nStart: p,

                    nEnd: p2-1,

                    cPath: numSeq+".pdf"});

                break;        

            }

        }

       console.println("Extracted " + numSeq + " pp " + p + " to " + p2)

       break;

    }

}

this.extractPages({

    nStart: finalpage,

    nEnd: this.numPages - 1,

    cPath: numSeq + ".pdf"

});

console.println("Extracted" + numSeq + " pp " + finalpage + " to " + (this.numPages - 1))

TOPICS
Acrobat SDK and JavaScript , Windows

Views

7.6K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Engaged , Jun 13, 2018 Jun 13, 2018

var numSeq="";

var finalpage=0;

for (var p = 0; p < this.numPages; p++) {

   for(var n = 0; n<this.getPageNumWords(p); n++){       

       if(this.getPageNthWord(p,n)=="EA"){

            numSeq=this.getPageNthWord(p,n+1)

            finalpage=p;

            break;

       }

    }

  

    for(var p2=p+1; p2<this.numPages; p2++){

        for(var n2=0; n2<this.getPageNumWords(p2); n2++){

            if(this.getPageNthWord(p2,n2)=="EA"){

                this.extractPages({

                    nStart: p,

          

...

Votes

Translate

Translate
LEGEND ,
Jun 13, 2018 Jun 13, 2018

Copy link to clipboard

Copied

Have you checked to see what the value of the string returned by "getPageNthWord" is?

For your example, I would expect it to be "EA 224400", so you are not getting a value of "EA": and that means no match. You could check to see if the first 2 characters returned are "EA" and then do your save. Or you could use the RegExp object to test the picture image of the returned value to see if it is the format "EA" followed by a number value.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jun 13, 2018 Jun 13, 2018

Copy link to clipboard

Copied

var numSeq="";

var finalpage=0;

for (var p = 0; p < this.numPages; p++) {

   for(var n = 0; n<this.getPageNumWords(p); n++){       

       if(this.getPageNthWord(p,n)=="EA"){

            numSeq=this.getPageNthWord(p,n+1)

            finalpage=p;

            break;

       }

    }

  

    for(var p2=p+1; p2<this.numPages; p2++){

        for(var n2=0; n2<this.getPageNumWords(p2); n2++){

            if(this.getPageNthWord(p2,n2)=="EA"){

                this.extractPages({

                    nStart: p,

                    nEnd: p2-1,

                    cPath: numSeq+".pdf"});

                break;       

            }

        }

       console.println("Extracted " + numSeq + " pp " + p + " to " + p2)

       break;

    }

}

this.extractPages({

    nStart: finalpage,

    nEnd: this.numPages - 1,

    cPath: numSeq + ".pdf"

});

console.println("Extracted" + numSeq + " pp " + finalpage + " to " + (this.numPages - 1))

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 13, 2018 Jun 13, 2018

Copy link to clipboard

Copied

This answer is right, except for some reason I have to put p-1 for for first nStart.

Thanks though

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jun 13, 2018 Jun 13, 2018

Copy link to clipboard

Copied

To extract the numbers, you could loop throught all fields of the doc, use the split() method of the string object to separate each words, put everything in an array, use the indexOf method to spot the occurence of "EA" and target the +1 indice.

Is there any chance you might encounter EA twice in the same field?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 14, 2018 Jun 14, 2018

Copy link to clipboard

Copied

Is it possible to change the code a little to work for something like....

EA:                                 ID:

12456                             889955

where it would detect EA, and somehow read the numbers underneath it?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jun 18, 2018 Jun 18, 2018

Copy link to clipboard

Copied

If the values are in different fields, and you used a naming convention for those fields, it is possible.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 18, 2018 Jun 18, 2018

Copy link to clipboard

Copied

LATEST

Could you elaborate please? Sorry I'm new to javascript.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines