Extract PDF Pages Based on Content multiple times

Question

Hello. I'm a beginner in javascript, and I have adobe acrobat X pro.

I want to be able to search for a specific string within the pdf, and then save the sequence of numbers that come after that string for my file name. Then I would want to check if the following pages have that same exact sequence of numbers, and if there are then I want to extract all the pages with that certain number sequence into one pdf. However, I want to be able to keep looking for new number sequences after I have finished extracting the pages with the first number sequence.

For example,

page 1 NO: 0158K

page 2 NO: 0158K

page 3 NO: 0158K

page 4 NO: 9090V

page 5 NO: 223M

page 6 NO: 223M

Using this example, pages 1, 2, and 3 would be extracted into one pdf together. Page 4 would be extracted by itself, and pages 5 and 6 would be extracted into one pdf.

I kind of have an idea of how to do this, but I'm not quite sure how to implement it, or combine some of the code that I found.

So far I think I have to use an array to put all the pages with the same number sequence in that array, then once all of the pages with that number sequence is located I have to extract it. I was thinking of using something similar to the code from this forum https://forums.adobe.com/message/7931552#7931552 with a few modifications to the code like having an if statements in the nested for loop to look for the number sequence.

So far this is what I have...

var pageArray=[];

for (var p = 0; p < this.numPages; p++) {

for(var n = 0; n<this.getPageNumWords(p); n++){

if(this.getPageNthWord(p,n)=="PPNO"){

dataCode=this.getPageNthWord(p,n+1)

pageArray.push(p);

break;

}

for (var p2=p+1; p2 < this.numPages; p2++){

for (var n2=0; n2<this.getPageNumWords(p2); n2++){

if(this.getPageNthWord(p2, n2)=="PPNO"){

if(this.getPageNthWord(p2, n2+1)==dataCode){

repeat++;

break;

}

else{

if (pageArray.length > 0) {

var d = app.newDoc();

for (var x=0; x<pageArray.length; x++){

d.insertPages( {

nPage:d.numPages-1,

cPath: dataCode + ".pdf",

nStart: pageArray,

nEnd:pageArray,

} );

}

d.deletePages(0);

}

break;

}

but after I ran this code, all I got was a new pdf with a blank page.

BarlaeDC · Accepted Answer

Hi,Using the following code I am able to get 2 documents created.PPNO: 0158KPPNO: 9090Vare both created as separate files.// Using the active document ( i only have one document open, made testing easiser)var curDoc = app.activeDocs[0];var pageArray=[];var repeat = 0;var dataCode = "";for (var p = 0; p < curDoc.numPages; p++){ for(var n = 0; n< curDoc.getPageNumWords(p); n++) { if(curDoc.getPageNthWord(p,n)=="PPNO") { dataCode=curDoc.getPageNthWord(p,n+1) ; pageArray.push(p); break; } } for (var p2=p+1; p2 < curDoc.numPages; p2++) { for (var n2=0; n2<curDoc.getPageNumWords(p2); n2++) { if(curDoc.getPageNthWord(p2, n2)=="PPNO") { // This if is why we only get two files as a result,  // because we can only get to the else if we don't match, but for the last  // number in the document we will never have a page that doesn't match if(curDoc.getPageNthWord(p2, n2+1)==dataCode) { repeat++; break; } else { if (pageArray.length > 0) { var d = app.newDoc(); for (var x=0; x<pageArray.length; x++) { d.insertPages( { nPage:d.numPages-1, // changed to use the curDoc cPath: curDoc.path, // as we are importing 1 page at a time. nStart: pageArray, }); } d.deletePages(0); } // reset so we get only the new pages. pageArray = []; } } } break; }}There are a couple of changes to the code, the main ones where the changes I mentioned, the other is to make sure we reset the page array so that we don't included the pages we found on the first run through of the loop on the second loop.Hope this helps.Malcolm

BarlaeDC · Answer

HI,I haven't had a chance to properly test your code but looking at it there are a couple of things that don't look right, so I will list them and you can see if you agree and make the changes and then see where we stand.1. In the document you have the text "NO" and in the code you compare that to "PPNO", guessing that is just a type when you made the forum post, but thought I should mention it.2. When you go to add the pages you use the following codefor (var x=0; x<pageArray.length; x++){ d.insertPages( { nPage:d.numPages-1, cPath: dataCode + ".pdf", nStart: pageArray, nEnd:pageArray, });}There are a couple of issues, cPath, is set to dataCode.pdf, but cPath should be the device independent path to the file you want to get the pages from, not the file you are placing the pages into, so this should be the full path to the original file.and you are passing nStart and nEnd as the same page, this is not necessary, as if you just want one page, just pass nStart and that will be the only page that is included.Hope this helpsMalcolm

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded