Skip to main content
antons62716517
Participant
April 2, 2018
Question

Split document by Content and save with content filename

  • April 2, 2018
  • 3 replies
  • 395 views

I'm a total novice using Adobe Javascript and I'm trying to split a large pdf with invoices into separate invoice files which are all named by the unique invoice number. 

On each page the invoice number comes directly after the the words "Invoice No." and is just a 6 digit number, however the words "Invoice No." do not appear at the same word count on each page. So I am confused about using getnthword as it differs all the time.  Can anyone help me with a script for this?

This topic has been closed for replies.

3 replies

antons62716517
Participant
April 4, 2018

This is where I got so far, it's fine if the invoice number  is 25th word, but that changes on each page.  Any help appreciated.  The words preceding the actual number are always "I N V O I C E No."

try {

for (var i = 0; i < this.numPages; i++) {

    var j = 25;
    var invoice_no = this.getPageNthWord(i, j);

   
    this.extractPages({

        nStart: i,

        cPath: "/c/temp/"+invoice_no+".pdf"});        
}
}
catch (e) { console.println("Aborted: " + e) }

try67
Community Expert
Community Expert
April 4, 2018

You can't use the word number, as that's not always the same, as you wrote.

Instead, you need to use another loop to iterate over all the words in each page, looking for the ones before the text you're interested in.

Legend
April 2, 2018

The obvious approach (not necessarily successful) is to use getPageNthWord for each word in turn, checking each one. Once you see “Invoice” follows by “No” (no dot) there is a chance the next Word is the number. This is not hard to try if you are a programmer experienced in creating algorithms and turning to code. Don’t expect to be able to to this by googling...

try67
Community Expert
Community Expert
April 2, 2018

What do you mean by "it differs all the time"? You provide it with a page number and an index number and it will return a word.

There's no guarantee as to the order of those words, though, which is what makes such scripts quite tricky.

I have a lot of experience in developing tools that do exactly what you described, so if you're interested I could create it for you (for a small fee). I'll need to see some sample pages, though. You can email it to me to try6767 at gmail.com.

antons62716517
Participant
April 4, 2018

What I mean is that there is a header above the invoice number with name and address, and depending on the size of the address, the Invoice Number gets shifted along so it is never the after the same number of words on each page.  It does appear on the same line however of each page if that helps.