Skip to main content
Participating Frequently
July 5, 2018
Answered

Extract PDF Pages Based on Content multiple times

  • July 5, 2018
  • 2 replies
  • 3475 views

Hello. I'm a beginner in javascript, and I have adobe acrobat X pro.

I want to be able to search for a specific string within the pdf, and then save the sequence of numbers that come after that string for my file name. Then I would want to check if the following pages have that same exact sequence of numbers, and if there are then I want to extract all the pages with that certain number sequence into one pdf. However, I want to be able to keep looking for new number sequences after I have finished extracting the pages with the first number sequence.

For example,

page 1    NO: 0158K

page 2    NO: 0158K

page 3    NO: 0158K

page 4    NO: 9090V

page 5    NO: 223M

page 6    NO: 223M

Using this example, pages 1, 2, and 3 would be extracted into one pdf together. Page 4 would be extracted by itself, and pages 5 and 6 would be extracted into one pdf.

I kind of have an idea of how to do this, but I'm not quite sure how to implement it, or combine some of the code that I found.

So far I think I have to use an array to put all the pages with the same number sequence in that array, then once all of the pages with that number sequence is located I have to extract it. I was thinking of using something similar to the code from this forum https://forums.adobe.com/message/7931552#7931552 with a few modifications to the code like having an if statements in the nested for loop to look for the number sequence.

So far this is what I have...

var pageArray=[];

for (var p = 0; p < this.numPages; p++) {

    for(var n = 0; n<this.getPageNumWords(p); n++){

       if(this.getPageNthWord(p,n)=="PPNO"){

            dataCode=this.getPageNthWord(p,n+1) 

            pageArray.push(p);

            break;

       }

    }

    for (var p2=p+1; p2 < this.numPages; p2++){

       

        for (var n2=0; n2<this.getPageNumWords(p2); n2++){

           

            if(this.getPageNthWord(p2, n2)=="PPNO"){

                if(this.getPageNthWord(p2, n2+1)==dataCode){

                    repeat++;

                    break;

                }

               

                else{

                    if (pageArray.length > 0) {

                        var d = app.newDoc();

                        for (var x=0; x<pageArray.length; x++){

                              d.insertPages(  {

                                  nPage:d.numPages-1,

   

                                  cPath: dataCode + ".pdf",

   

                                  nStart: pageArray,

   

                                  nEnd:pageArray,

                              }  );

                         }

                         d.deletePages(0);

                    }

                }

       

            }

        }

break;

    }

}  

but after I ran this code, all I got was a new pdf with a blank page.

This topic has been closed for replies.
Correct answer BarlaeDC

Hi,

Using the following code I am able to get 2 documents created.

PPNO: 0158K

PPNO: 9090V

are both created as separate files.

// Using the active document ( i only have one document open, made testing easiser)

var curDoc = app.activeDocs[0];

var pageArray=[];

var repeat = 0;

var dataCode = "";

for (var p = 0; p < curDoc.numPages; p++)

{

    for(var n = 0; n< curDoc.getPageNumWords(p); n++)

    {

       if(curDoc.getPageNthWord(p,n)=="PPNO")

       {

            dataCode=curDoc.getPageNthWord(p,n+1) ;

            pageArray.push(p);

            break;

       }

    }

    for (var p2=p+1; p2 < curDoc.numPages; p2++)

    {

        for (var n2=0; n2<curDoc.getPageNumWords(p2); n2++)

        {

            if(curDoc.getPageNthWord(p2, n2)=="PPNO")

            {

                // This if is why we only get two files as a result,

                // because we can only get to the else if we don't match, but for the last

                // number in the document we will never have a page that doesn't match

                if(curDoc.getPageNthWord(p2, n2+1)==dataCode)

                {

                    repeat++;

                    break;

                }

                else

                {

                    if (pageArray.length > 0)

                    {

                        var d = app.newDoc();

                        for (var x=0; x<pageArray.length; x++)

                        {

                          d.insertPages(

                          {

                            nPage:d.numPages-1,

                            // changed to use the curDoc

                            cPath: curDoc.path,

                            // as we are importing 1 page at a time.

                            nStart: pageArray,

                          });

                        }

                        d.deletePages(0);

                    }

                    // reset so we get only the new pages.

                    pageArray = [];

                }

            }

        }

    break;

    }

}

There are a couple of changes to the code, the main ones where the changes I mentioned, the other is to make sure we reset the page array so that we don't included the pages we found on the first run through of the loop on the second loop.

Hope this helps.

Malcolm

2 replies

BarlaeDC
Community Expert
BarlaeDCCommunity ExpertCorrect answer
Community Expert
July 7, 2018

Hi,

Using the following code I am able to get 2 documents created.

PPNO: 0158K

PPNO: 9090V

are both created as separate files.

// Using the active document ( i only have one document open, made testing easiser)

var curDoc = app.activeDocs[0];

var pageArray=[];

var repeat = 0;

var dataCode = "";

for (var p = 0; p < curDoc.numPages; p++)

{

    for(var n = 0; n< curDoc.getPageNumWords(p); n++)

    {

       if(curDoc.getPageNthWord(p,n)=="PPNO")

       {

            dataCode=curDoc.getPageNthWord(p,n+1) ;

            pageArray.push(p);

            break;

       }

    }

    for (var p2=p+1; p2 < curDoc.numPages; p2++)

    {

        for (var n2=0; n2<curDoc.getPageNumWords(p2); n2++)

        {

            if(curDoc.getPageNthWord(p2, n2)=="PPNO")

            {

                // This if is why we only get two files as a result,

                // because we can only get to the else if we don't match, but for the last

                // number in the document we will never have a page that doesn't match

                if(curDoc.getPageNthWord(p2, n2+1)==dataCode)

                {

                    repeat++;

                    break;

                }

                else

                {

                    if (pageArray.length > 0)

                    {

                        var d = app.newDoc();

                        for (var x=0; x<pageArray.length; x++)

                        {

                          d.insertPages(

                          {

                            nPage:d.numPages-1,

                            // changed to use the curDoc

                            cPath: curDoc.path,

                            // as we are importing 1 page at a time.

                            nStart: pageArray,

                          });

                        }

                        d.deletePages(0);

                    }

                    // reset so we get only the new pages.

                    pageArray = [];

                }

            }

        }

    break;

    }

}

There are a couple of changes to the code, the main ones where the changes I mentioned, the other is to make sure we reset the page array so that we don't included the pages we found on the first run through of the loop on the second loop.

Hope this helps.

Malcolm

Participating Frequently
July 10, 2018

Thanks the code helped a lot. The only thing I'm still having a problem with is that the new pdfs don't save, they only show up as a temporary pdf. Hence, the name for the new pdfs have temp at the end. Also, I can't figure out how to customize the name for the new pdfs. Because the main reason why I put cPath: dataCode + ".pdf" in my original post was because I wanted to write the code so that the name of the new pdfs would be the dataCode. Like the name of the new pdfs would be 0158K and 9090V.

BarlaeDC
Community Expert
Community Expert
July 10, 2018

HI,

You can just call

d.saveAs ( "/path/to/save/location/" + dataCode + ".pdf");

just after the d.deletePages(); line

Hope the helps

Malcolm

BarlaeDC
Community Expert
Community Expert
July 5, 2018

HI,

I haven't had a chance to properly test your code but looking at it there are a couple of things that don't look right, so I will list them and you can see if you agree and make the changes and then see where we stand.

1. In the document you have the text "NO" and in the code you compare that to "PPNO", guessing that is just a type when you made the forum post, but thought I should mention it.

2. When you go to add the pages you use the following code

for (var x=0; x<pageArray.length; x++){

    d.insertPages(  {

    nPage:d.numPages-1,

    cPath: dataCode + ".pdf",

    nStart: pageArray,

    nEnd:pageArray,

    });

}

There are a couple of issues, cPath, is set to dataCode.pdf, but cPath should be the device independent path to the file you want to get the pages from, not the file you are placing the pages into, so this should be the full path to the original file.

and you are passing nStart and nEnd as the same page, this is not necessary, as if you just want one page, just pass nStart and that will be the only page that is included.

Hope this helps

Malcolm

Participating Frequently
July 6, 2018

Hello. Yeah the "NO" is a typo that I made when I posted the forum.

As for the second part, should that section of code end up being like this then?

   for (var x=0; x<pageArray.length; x++){

                              d.insertPages(  {

                                  nPage:d.numPages-1,

                                  cPath: this.path,

                                  nStart: pageArray,

                              }  );

                         }

However, I ran the code with this corrected portion, and I didn't get a new pdf at all.

Bernd Alheit
Community Expert
Community Expert
July 6, 2018

You mean after using

var d = app.newDoc();

you can't see the new document?