Skip to main content
Participant
July 3, 2024
Question

Sort large PDF containing letters into batches

  • July 3, 2024
  • 2 replies
  • 599 views

Hi all, 

I am currently trying to source a method to separate a bulk pdf containing multiple letters with varying lengths into groups of letters containing 8 pages, 7 pages, and 6 pages to simplify the letter insertion stage after printing. I am unsure how the data is sorced but all letters will be placed into one PDF. 

 

There isn't any data that is used on all pages. There is one reference number that is used on every page of the letter except for the first one of each letter. I was thinking that I could use the page numbers being in the format of 1 of 8 and so on. 

 

Any suggestions would be very much appreciated. Thank you!

This topic has been closed for replies.

2 replies

Thom Parker
Community Expert
Community Expert
July 3, 2024

Post some sample pages from the document so we can see if this can be automated, as suggested by Try67.

  

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
try67
Community Expert
Community Expert
July 3, 2024

If there's text that says "Page X of Y" on each page then it should be possible to identify each letter and then split it to a new file. This will require the development of a custom-made script, though.

I've created many similar tools for my clients in the past and would be happy to have a look at a sample file and let you know if I think it's doable, and if so, for how much. If you're interested, feel free to contact me privately by clicking my user-name and then on "Send a Message" to discuss it further.

Participant
July 3, 2024

Hi Try67, 

Thank you for your response!

So the pages do have "Page x of y". I have been looking at potential scripts but it's proving to be a challenge to find something that works for me. I don't have much experience with scripts but I did find something that partially works. What I mean by that is that I can search one word and extract pages containing that but I was hoping that I could extract pages into a new PDF that have "of 8" and so on so that there's one file with all 8 page letters, one with 7 page letters, and one with 6 page letters but that didn't work. 

 

This is what I tried:

// Iterates over all pages and find a given string and extracts all 
// pages on which that string is found to a new file.
 
var pageArray = [];
 
var stringToSearchFor = "Dear";
 
for (var p = 0; p < this.numPages; p++) {
// iterate over all words
for (var n = 0; n < this.getPageNumWords(p); n++) {
if (this.getPageNthWord(p, n) == stringToSearchFor) {
pageArray.push(p);
break;
}
}
}
 
if (pageArray.length > 0) {
// extract all pages that contain the string into a new document
var d = app.newDoc();    // this will add a blank page - we need to remove that once we are done
for (var n = 0; n < pageArray.length; n++) {
d.insertPages( {
nPage: d.numPages-1,
cPath: this.path,
nStart: pageArray[n],
nEnd: pageArray[n],
} );
}
 
    // remove the first page
    d.deletePages(0);
    
}
 
I used the word "Dear" as it appears on the first page of each letters. 
Participant
July 3, 2024

Just another quick update, 

My colleage has created a script to find the heading text on each page and separate each letter into an individual PDF. The next step we need to figure out is how to merge all 6 page, 7 page, and 8 page letters into their respective page length categories.