Copy link to clipboard
Copied
Hi all, I am using Adobe Acrobat Pro 2017 and I am trying to extract every two pages of a multi page PDF. The two pages that are extracted have an ID that can be searchable if all words in the document are put into a string. I have created similar code that works, but it is supposed to extract every page and rename it the first 8 digit code it can find using regular expressions. Take a look at the below code and let me know what you thing. Thanks!
/* Extract 2-page funding notice */
// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;
// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");
for (var i = 0; (i * 2) < this.numPages; i++) { // Loop through the entire document
numWords = this.getPageNumWords(i); // Find out how many words are on the page
var WordString = ""; // Prepare a string
for (var j = 0; (j < numWords; j++) // Put all the words on the page into a string
{WordString = WordString + " " + this.getPageNthWord(i, j);}
ID = WordString.match(/\b\d{8}\b/); // Search for the 8 digit ID control # in the string
this.extractPages({
nStart: i * 2,
nEnd: (i * 2) + 1,
cPath: "/J/myfilepath/" + "SBIC_" + ID +"-Fnew.pdf"
});
}
This code does run, however, not how I want it to. It pulls the first 8 digit ID in the string and the last two pages of the document.
Hello, upon posting this question, I found the answer not too long after and it was pretty simple. Just update this line of code to the below and you will be golden!
{WordString = WordString + " " + this.getPageNthWord((i*2), j);}
Copy link to clipboard
Copied
Hello, upon posting this question, I found the answer not too long after and it was pretty simple. Just update this line of code to the below and you will be golden!
{WordString = WordString + " " + this.getPageNthWord((i*2), j);}
Copy link to clipboard
Copied
This doesn't look right. Why are you multiplying the value of i by 2? If you want to skip a page change the step part of the if-condition to i+=2.
Also, since what you're looking for is a single word I don't see the need to add up all the text in the page. You can just test each word on its own.
You're also missing an if-condition checking that ID is not null, in case no matches are found, and a break command to stop the (inner) loop once the code has been identified and the pages extracted.
Copy link to clipboard
Copied
Hi try67, thanks for reaching out. I am multiplying the value of i by 2 because I need to extract every two pages from the document.
How would I go about testing each word on its own? when I tried researching ways to search for text, this was the only way that worked for me.
I have include this code snippet the line that contains the variable line and groups the rest of the code. I'm not sure how to enter a break command to stop the inner loop though.
if (WordString.match(/\b\d{8}\b/)) { // Search for the word 8 digit SBA Control ID in the string
search.matchWholeWord = true; // If we got here, we'll search for the 8 digit SBA Control ID in the document
Copy link to clipboard
Copied
This is what I meant:
pagesLoop:
for (var i = 0; i<this.numPages; i+=2) { // Loop through the entire document
var numWords = this.getPageNumWords(i); // Find out how many words are on the page
for (var j = 0; j < numWords; j++) { // Put all the words on the page into a string
var WordString = this.getPageNthWord(i, j);
if (/$\d{8}^/.test(WordString)) { // Search for the 8 digit ID control # in the string
this.extractPages({
nStart: i,
nEnd: i+1,
cPath: "/J/myfilepath/" + "SBIC_" + WordString +"-Fnew.pdf"
});
continue pagesLoop;
}
}
console.println("ERROR! Could not find the ID on page " + (i+1));
}
Copy link to clipboard
Copied
Another way to do this that is much more efficient is to use the Redaction pattern search. This search places a redact annot over all matching text. And it does it very quickly compared to JS word searches. Then the script only needs to get the locations of the redact annots. These annots can be deleted after collecting the naming data. To do this though, you'll need to create a custom search pattern, which are defined in this file:
C:\Users\<user name>\AppData\Roaming\Adobe\Acrobat\DC\Redaction\ENU\SearchRedactPatterns.xml
The redaction search can be used in a batch process as the first step, then the extraction script as the second step.