Extract PDF Pages Based on Content multiple times
Hello. I'm a beginner in javascript, and I have adobe acrobat X pro.
I want to be able to search for a specific string within the pdf, and then save the sequence of numbers that come after that string for my file name. Then I would want to check if the following pages have that same exact sequence of numbers, and if there are then I want to extract all the pages with that certain number sequence into one pdf. However, I want to be able to keep looking for new number sequences after I have finished extracting the pages with the first number sequence.
For example,
page 1 NO: 0158K
page 2 NO: 0158K
page 3 NO: 0158K
page 4 NO: 9090V
page 5 NO: 223M
page 6 NO: 223M
Using this example, pages 1, 2, and 3 would be extracted into one pdf together. Page 4 would be extracted by itself, and pages 5 and 6 would be extracted into one pdf.
I kind of have an idea of how to do this, but I'm not quite sure how to implement it, or combine some of the code that I found.
So far I think I have to use an array to put all the pages with the same number sequence in that array, then once all of the pages with that number sequence is located I have to extract it. I was thinking of using something similar to the code from this forum https://forums.adobe.com/message/7931552#7931552 with a few modifications to the code like having an if statements in the nested for loop to look for the number sequence.
So far this is what I have...
var pageArray=[];
for (var p = 0; p < this.numPages; p++) {
for(var n = 0; n<this.getPageNumWords(p); n++){
if(this.getPageNthWord(p,n)=="PPNO"){
dataCode=this.getPageNthWord(p,n+1)
pageArray.push(p);
break;
}
}
for (var p2=p+1; p2 < this.numPages; p2++){
for (var n2=0; n2<this.getPageNumWords(p2); n2++){
if(this.getPageNthWord(p2, n2)=="PPNO"){
if(this.getPageNthWord(p2, n2+1)==dataCode){
repeat++;
break;
}
else{
if (pageArray.length > 0) {
var d = app.newDoc();
for (var x=0; x<pageArray.length; x++){
d.insertPages( {
nPage:d.numPages-1,
cPath: dataCode + ".pdf",
nStart: pageArray
nEnd:pageArray
} );
}
d.deletePages(0);
}
}
}
}
break;
}
}
but after I ran this code, all I got was a new pdf with a blank page.
