Split document by Content and save with content filename

Apr 02, 2018

Copy link to clipboard

Copied

I'm a total novice using Adobe Javascript and I'm trying to split a large pdf with invoices into separate invoice files which are all named by the unique invoice number. 

On each page the invoice number comes directly after the the words "Invoice No." and is just a 6 digit number, however the words "Invoice No." do not appear at the same word count on each page. So I am confused about using getnthword as it differs all the time.  Can anyone help me with a script for this?

TOPICS
Acrobat SDK and JavaScript, Windows

Views

103

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Split document by Content and save with content filename

Apr 02, 2018

Copy link to clipboard

Copied

I'm a total novice using Adobe Javascript and I'm trying to split a large pdf with invoices into separate invoice files which are all named by the unique invoice number. 

On each page the invoice number comes directly after the the words "Invoice No." and is just a 6 digit number, however the words "Invoice No." do not appear at the same word count on each page. So I am confused about using getnthword as it differs all the time.  Can anyone help me with a script for this?

TOPICS
Acrobat SDK and JavaScript, Windows

Views

104

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Apr 02, 2018 0
Apr 02, 2018

Copy link to clipboard

Copied

What do you mean by "it differs all the time"? You provide it with a page number and an index number and it will return a word.

There's no guarantee as to the order of those words, though, which is what makes such scripts quite tricky.

I have a lot of experience in developing tools that do exactly what you described, so if you're interested I could create it for you (for a small fee). I'll need to see some sample pages, though. You can email it to me to try6767 at gmail.com.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Apr 02, 2018 0
Apr 03, 2018

Copy link to clipboard

Copied

What I mean is that there is a header above the invoice number with name and address, and depending on the size of the address, the Invoice Number gets shifted along so it is never the after the same number of words on each page.  It does appear on the same line however of each page if that helps.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Apr 03, 2018 0
Apr 02, 2018

Copy link to clipboard

Copied

The obvious approach (not necessarily successful) is to use getPageNthWord for each word in turn, checking each one. Once you see “Invoice” follows by “No” (no dot) there is a chance the next Word is the number. This is not hard to try if you are a programmer experienced in creating algorithms and turning to code. Don’t expect to be able to to this by googling...

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Apr 02, 2018 0
Apr 04, 2018

Copy link to clipboard

Copied

This is where I got so far, it's fine if the invoice number  is 25th word, but that changes on each page.  Any help appreciated.  The words preceding the actual number are always "I N V O I C E No."

try {

for (var i = 0; i < this.numPages; i++) {

    var j = 25;
    var invoice_no = this.getPageNthWord(i, j);

   
    this.extractPages({

        nStart: i,

        cPath: "/c/temp/"+invoice_no+".pdf"});        
}
}
catch (e) { console.println("Aborted: " + e) }

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Apr 04, 2018 0
try67 LATEST
Apr 04, 2018

Copy link to clipboard

Copied

You can't use the word number, as that's not always the same, as you wrote.

Instead, you need to use another loop to iterate over all the words in each page, looking for the ones before the text you're interested in.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Apr 04, 2018 0