Skip to main content
New Participant
January 16, 2023
Answered

extracting pages (splitting file) based on partial word match

  • January 16, 2023
  • 1 reply
  • 821 views

Hello,

I have no programming knowledge and I am going through this website to write a script. I have Acrobat DC.

I have a PDF file with over 400 pages. First page of the file is a coverpage for case and then it follows supporting documents. Number of pages can vary from case to case. But the first page of the case would always be the coverpage. Cover page is the only place that mentions the reference number.  I would like to split the file before the second time the reference numbers appears (it is a certain combination of alphabets appear). Or I would like to extract the cases as separate files The Reference numbers goes something like N30111XXXXX. Where the XXXXX is 5 numbers that are unique and Identify each case. I would like the program to ask for the first 6 alphanumeric numbers from the user and go through the file and extract all the cases and save them using the number e.g. N3011103547. I tried to use to modify the code below that I got from Split PDF based on content, and save into different pdfs with custom file name

The only issue is that it only searches for the whole word and doesn’t look for partial word. So, if the user inputs N30111 nothing is matched but when the use inputs N3011103547, it would only match that page.

Any help would be greatly appreciated.

var numSeq="";

var finalpage=0;

var val = app.response("Enter a value");

for (var p = 0; p < this.numPages; p++) {

   for(var n = 0; n<this.getPageNumWords(p); n++){      

       if(this.getPageNthWord(p,n)== val){

            numSeq=this.getPageNthWord(p,n+1)

            finalpage=p;

            break;

       }

    }

 

    for(var p2=p+1; p2<this.numPages; p2++){

        for(var n2=0; n2<this.getPageNumWords(p2); n2++){

            if(this.getPageNthWord(p2,n2)== val){

                this.extractPages({

                    nStart: finalpage,

                    nEnd: p2-1,

                    cPath: val+ (p+3000) + ".pdf"});

                break;      

            }

        }

       console.println("Extracted " + numSeq + " pp " + p + " to " + p2)

       break;

    }

}

this.extractPages({

    nStart: finalpage,

    nEnd: this.numPages - 1,

    cPath: val+ (finalpage+3000) + ".pdf"

});

console.println("Extracted" + numSeq + " pp " + finalpage + " to " + (this.numPages - 1))

This topic has been closed for replies.
Correct answer Ahsanbhai

I was able to find the answer for this one. posting it here for anyone else looking:

 

var numSeq="";
var finalpage=0;
var val = app.response("Enter a value");
for (var p = 0; p < this.numPages; p++) {
for(var n = 0; n<this.getPageNumWords(p); n++){
if(this.getPageNthWord(p,n).substring(0,6)== val){
numSeq=this.getPageNthWord(p,n).substring(6,11)
finalpage=p;
break;
}
}

for(var p2=p+1; p2<this.numPages; p2++){
for(var n2=0; n2<this.getPageNumWords(p2); n2++){
if(this.getPageNthWord(p2,n2).substring(0,6)== val){
this.extractPages({
nStart: finalpage,
nEnd: p2-1,
cPath: val+numSeq+".pdf"});
break;
}
}
console.println("Extracted " + numSeq + " pp " + p + " to " + p2)
break;
}
}
this.extractPages({
nStart: finalpage,
nEnd: this.numPages - 1,
cPath: val+numSeq+".pdf"
});
console.println("Extracted" + numSeq + " pp " + finalpage + " to " + (this.numPages - 1))

1 reply

AhsanbhaiAuthorCorrect answer
New Participant
January 18, 2023

I was able to find the answer for this one. posting it here for anyone else looking:

 

var numSeq="";
var finalpage=0;
var val = app.response("Enter a value");
for (var p = 0; p < this.numPages; p++) {
for(var n = 0; n<this.getPageNumWords(p); n++){
if(this.getPageNthWord(p,n).substring(0,6)== val){
numSeq=this.getPageNthWord(p,n).substring(6,11)
finalpage=p;
break;
}
}

for(var p2=p+1; p2<this.numPages; p2++){
for(var n2=0; n2<this.getPageNumWords(p2); n2++){
if(this.getPageNthWord(p2,n2).substring(0,6)== val){
this.extractPages({
nStart: finalpage,
nEnd: p2-1,
cPath: val+numSeq+".pdf"});
break;
}
}
console.println("Extracted " + numSeq + " pp " + p + " to " + p2)
break;
}
}
this.extractPages({
nStart: finalpage,
nEnd: this.numPages - 1,
cPath: val+numSeq+".pdf"
});
console.println("Extracted" + numSeq + " pp " + finalpage + " to " + (this.numPages - 1))

New Participant
February 5, 2023

Hello,

I see you posted the correct script that you found, but could you please share how to execute the script also.

I have a pdf document with 200 plus pages and need to print every page as a separate file and save with a text in the file.

Is this achieveable with your code.

Thanks in advance.