Copy link to clipboard
Copied
Rookie here! I would love some help. Every week I sort through hundreds of PDF pages and combine them based on mutual routing numbers...there's gotta be a better way to do this. I've thought of either creating code to reorganize or extract the pages from the PDF document with matching strings where it finds routing number strings that I have in an xlsx file on Windows. My version is Adobe Acrobat Standard 2020. Adobe only takes Javascript which I'm unfamiliar with but have attempted to compile code from other pages to try and create something that might work. Help?
// Iterates over all pages and find a given string and extracts all
// pages on which that string is found to a new file.
var pageArray = [];
var stringsToSearchFor = ["routingnumber"];
for (var p = 0; p < this.numPages; p++) {
// iterate over all words
for (var n = 0; n < this.getPageNumWords(p); n++) {
if (this.getPageNthWord(p, n)!=-1) {
pageArray.push(p);
break;
}
}
}
if (pageArray.length > 0) {
// extract all pages that contain the string into a new document
var d = app.newDoc(); // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageArray.length; n++) {
d.insertPages( {
nPage: d.numPages-1,
cPath: this.path,
nStart: pageArray[n],
nEnd: pageArray[n],
} );
}
// remove the first page
d.deletePages(0);
}
Copy link to clipboard
Copied
The structure of the code is good, but it doesn't actually collect the page words or search for anything.
Here's a change to the portion of the code that iterates over the pages and words.
var pageArray = [], cPageText;
var stringsToSearchFor = ["routingnumber"];
for (var p = 0; p < this.numPages; p++) {
// Collect all words on page
cPageText = "";
for (var n = 0; n < this.getPageNumWords(p); n++) {
cPageText += this.getPageNthWord(p, n);
}
if(stringsToSearchFor.some(function(cTest){return (cPageText.indexOf(cTest) != -1);}))
pageArray.push(p);
}
This code collects all the words on the page, and then searchs that string for any matches in the array of routing numbers.
The reason for doing it this way is because Acrobat breaks all words on non-word boundaries. So if the routing number contains punctuation it will be divided into several words. If this is not the case, and the routing number is a single continous string of alpha-numeric characters, then the code can be made more efficient by searching individual words.
So another method would be to search for a pattern using a regular expression, if that is suitable.
And a variation on the technique is to collect pages numbers for each different routing number, instead of mixing them in single array. This can be done with an object.
There is yet another technique that might be many times faster using an Action script. Actions work on many documents at the same time. You can see an example of this technique in many of the Actions you can download here (such as the "Find and Highlight words"):
https://acrobatusers.com/actions-exchange/