Copy link to clipboard
Copied
I need to be able to grab the invoice number from pdfs and add to filename. Customer always sends their invoices in the same format. Is there a way to get the text from the pdf and add it to the filename while resaving the document?
I am using DC professional
If you know exactly where the text is, you can crop the page down to just that portion, and then iterate over all words in that area using Doc.getPageNthWord() (Acrobat DC SDK Documentation) you should be able to extract just the text you are interested in. If you look through the archives, and search for getPageNthWord, you should find a number of examples.
Copy link to clipboard
Copied
Assuming this is "real" text and not an image of text then yes, it might be possible.
However, it requires a way of identifying the invoice number, for example based on its format, location on the page or context, or a combination of these methods. Each one will require a different kind of script, though, and of course it will only work if the files are fairly consistent with each other.
Copy link to clipboard
Copied
I have the x, y position of the text on the page. It is real text that can be highlighted and the pdfs from this vendor are very consistent in their format. I would like to grab the text (actually a number) and add it to the beginning of the filename.
Copy link to clipboard
Copied
OK, in that case it should be possible, but it's a tricky task. You will need to create a loop that iterates over all the words in the page (or the entire file, if it's not always on a specific page), get their location on the page (using the getPageNthWordQuads method), and then compare it to the area where you expect the target text to be located. Definitely not a simple task if you don't have experience with Acrobat JS...
I've developed many similar tools in the past so if you're interested in hiring someone to do it for you, for a small fee, feel free to contact me privately at try6767 at gmail.com.
Copy link to clipboard
Copied
So, you can't just point to the x-y position of the text even if its page and position does not change from document to document?
Copy link to clipboard
Copied
If you know exactly where the text is, you can crop the page down to just that portion, and then iterate over all words in that area using Doc.getPageNthWord() (Acrobat DC SDK Documentation) you should be able to extract just the text you are interested in. If you look through the archives, and search for getPageNthWord, you should find a number of examples.
Copy link to clipboard
Copied
Actually, I just realized that most of these examples are over at the old AcrobatUsers.com site. Take a look here: Reverse Crop With Javascript (JavaScript)
Copy link to clipboard
Copied
So, I ran this script from an example - thanks.
var PageText = "";
for (var j = 0; j < 30;j++) {
var word = this.getPageNthWord(1,j,false);
PageText += word;
}
app.alert(PageText);
I found the text I need to be the 13th word on the page. I can now just use the getPageNthWord function and assign a variable then insert the variable in a filename function to put the invoice number into the filename.
Thank you I think I can muddle on now.
I don't see a need for cropping or iterating over the whole document. Am I wrong in this?
Copy link to clipboard
Copied
Are you sure the number will always be the 13th word on each page of each
file? If so then you can do it like that...
Copy link to clipboard
Copied
A small sampling shows these documents to be fairly consistent and software generated. Possibly a form that has been flattened or some other structured document.
I will go with this - and move on to tackling the problem of making this rename batches of 20 - 100 files at a time. If the documents prove to be inconsistent, I will need to muddle through the more formal way - right now, down and dirty seems to be working and fits my time schedule. I'm sorry if this proves to be an anathema those wholly vested in the process. Thank you all for your help. I may be back with batch renaming issues.
Copy link to clipboard
Copied
If it works, that's all that matters...
Copy link to clipboard
Copied
arrgh
I've got it stamping and renaming files properly and using the 13th word in the filename even. But, I am getting this error when it tries to execute this.saveas; "exception in line 56 of function top level, script Batch:exec Raise error: the file may be read only blah, blah, blah" The path is good, tried many different approaches - even local.
Here is what I am working with:
// Begin job
if ( typeof global.counter == "undefined" || global.date_reply == null ) {
console.println("Begin Job Code");
global.counter = 0;
// Grab date from User to be stamped
var dialogNumber = "Number of Files";
global.FileCnt = app.response("Number of Files to be Processed:", dialogNumber);
var dialogTitle = "Date Received";
var defaultAnswer = util.printd("mm-dd", new Date());
global.date_reply = app.response("Date Received:",
dialogTitle, defaultAnswer);
}
// Main code to process each of the selected files
try {
global.counter++
console.println("Processing File #" + global.counter);
// insert batch code here.
this.addWatermarkFromText({
cText: "GHC Received " + global.date_reply,
nTextAlign: app.constants.align.left,
nHorizAlign: app.constants.align.left,
nVertAlign: app.constants.align.bottom,
nHorizValue: 1, nVertValue: 1,
nFontSize: 8,});
this.addWatermarkFromText({
cText: "Finance Inbox",
nTextAlign: app.constants.align.right,
nHorizAlign: app.constants.align.right,
nVertAlign: app.constants.align.bottom,
nHorizValue: -4, nVertValue: 1,
nFontSize: 8,
aColor: ["G",.5]
});
} catch(e) {
console.println("Batch aborted on run #" + global.counter);
delete global.counter; // Try again, and avoid End Job code
event.rc = false; // Abort batch
}
var pronmbr = getPageNthWord(0,13,false)
var re = /\.pdf$/;
var date_replace = global.date_reply.replace(/[?:\\/|<>"*]/g,"");
var fname = this.documentFileName.replace(re,"_");
var filename = pronmbr + "ART INV" + date_replace + ".pdf";
console.println(filename);
// File path must be changed manually to correct directory
this.saveAs("/O/1_invoice staging/" + filename);
// End job
if ( global.counter == global.FileCnt ) {
console.println("End Job Code");
// Insert endJob code here
// Remove any global variables used in case user wants to run
// another batch sequence using the same variables
delete global.counter;
delete global.date_reply;
delete global.FileCnt;
}
Copy link to clipboard
Copied
What's the full file-name that you're trying to use?
Copy link to clipboard
Copied
pronmbr + "ART INV" + date_replace + ".pdf";
would be something like "105063 ART INV 08-01.pdf"
with pronmbr being the 13th word, ART INV being inserted text and date_replace being the user date entered in the dialogue box. I get an appropriate filename in the console screen with each error message. One for each file batched - always the same error, but it saves as the original filename.
Copy link to clipboard
Copied
From what context are you running the code?
Does it work if you only execute the saveAs command from the console with the full path, hard-coded into the code?
Copy link to clipboard
Copied
part of an action in Acrobat X pro. I took one I use that works and added the var pronmbr = getPageNthWord(0,13,false) command.
Actually, i get an undefined error when I try the console:
saveAs("/O/1_invoice staging/" test filename)
undefined
Copy link to clipboard
Copied
"Undefined" is not an error message. It just means the code executed without returning any values.
Do you see the file saved in the target folder?
Copy link to clipboard
Copied
sorry - no it is not saving to the target folder.
Copy link to clipboard
Copied
Can you post the exact code you're executing?
Copy link to clipboard
Copied
// Begin job
if ( typeof global.counter == "undefined" || global.date_reply == null ) {
console.println("Begin Job Code");
global.counter = 0;
// Grab date from User to be stamped
var dialogNumber = "Number of Files";
global.FileCnt = app.response("Number of Files to be Processed:", dialogNumber);
var dialogTitle = "Date Received";
var defaultAnswer = util.printd("mm-dd", new Date());
global.date_reply = app.response("Date Received:",
dialogTitle, defaultAnswer);
}
// Main code to process each of the selected files
try {
global.counter++
console.println("Processing File #" + global.counter);
// insert batch code here.
this.addWatermarkFromText({
cText: "GHC Received " + global.date_reply,
nTextAlign: app.constants.align.left,
nHorizAlign: app.constants.align.left,
nVertAlign: app.constants.align.bottom,
nHorizValue: 1, nVertValue: 1,
nFontSize: 8,});
this.addWatermarkFromText({
cText: "Finance Inbox",
nTextAlign: app.constants.align.right,
nHorizAlign: app.constants.align.right,
nVertAlign: app.constants.align.bottom,
nHorizValue: -4, nVertValue: 1,
nFontSize: 8,
aColor: ["G",.5]
});
} catch(e) {
console.println("Batch aborted on run #" + global.counter);
delete global.counter; // Try again, and avoid End Job code
event.rc = false; // Abort batch
}
var pronmbr = getPageNthWord(0,13,false)
var re = /\.pdf$/;
var date_replace = global.date_reply.replace(/[?:\\/|<>"*]/g,"");
var fname = this.documentFileName.replace(re,"_");
var filename = pronmbr + "ART INV" + date_replace + ".pdf";
console.println(filename);
// File path must be changed manually to correct directory
this.saveAs("/O/1_invoice staging/" + filename);
// End job
if ( global.counter == global.FileCnt ) {
console.println("End Job Code");
// Insert endJob code here
// Remove any global variables used in case user wants to run
// another batch sequence using the same variables
delete global.counter;
delete global.date_reply;
delete global.FileCnt;
}
Copy link to clipboard
Copied
No, I mean when you test just the saveAs command from the console window,
what code did you execute, exactly?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
You can't be executing the code, because it should have failed (because you didn't include the ".pdf" suffix).
To execute it you must first select it and then press Ctrl+Enter.
Copy link to clipboard
Copied
The 13th word rule might work for you, but it seems risky to me. Are you quite sure that every word there today will always be there? That there will never be another word? And that you might not get extra words (for example an extra space)?
The "canonical" way to solve this is to use getPageNthWord and getPageNthWordQuads. The Quads give the location of a quadrilateral containing the word. You can't use the size exactly, nor the X,Y directly, but you could use some fuzzy logic to see if this information seems to be from about the right part of the page.
Copy link to clipboard
Copied
You can, but it's not a trivial task. There's no command that says "give me the text in location x,y on page z"...
Find more inspiration, events, and resources on the new Adobe Community
Explore Now