Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Grabbing text data from a pdf to use in javascript

Community Beginner ,
Jul 31, 2017 Jul 31, 2017

I need to be able to grab the invoice number from pdfs and add to filename.  Customer always sends their invoices in the same format.  Is there a way to get the text from the pdf and add it to the filename while resaving the document?

I am using DC professional

TOPICS
Acrobat SDK and JavaScript , Windows
5.2K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Aug 01, 2017 Aug 01, 2017

If you know exactly where the text is, you can crop the page down to just that portion, and then iterate over all words in that area using Doc.getPageNthWord() (Acrobat DC SDK Documentation​) you should be able to extract just the text you are interested in. If you look through the archives, and search for getPageNthWord, you should find a number of examples.

Translate
LEGEND ,
Aug 01, 2017 Aug 01, 2017

1. I don't like the look of trying to save as test. Even if it succeeds it will just be called test and won't automatically open in Acrobat. Try test.pdf.

2. Are you able to save to the folder "O:\1_invoice staging" manually?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 02, 2017 Aug 02, 2017

Okay,

Adding the ".pdf" extension to the code makes it work in the console window.  So, executing that line with named test file works.  I can't use the script line exactly because the filename contains one variable and a user entered value, (+ .pdf)

The error seems to me to be that the file is viewed as open -  "exception in line 56 of function top level, script Batch:exec  Raise error: the file may be read only ...."  or the pronmbr variable is not changing with the iteration through the selected files, so it thinks it is trying to save the exact same name again - maybe???  I'm at a loss.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Aug 02, 2017 Aug 02, 2017

You can use app.alert to write the file name to the console and see what is going on as the script runs.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 02, 2017 Aug 02, 2017

Thanks, just tried this and app.alert brings up each filename correctly, I click OK and then I get the error message with no file save.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 02, 2017 Aug 02, 2017

Copy the actual file name that you see in the alert (or output it to the console, and then copy it from there) into your saveAs command and run it manually from the console. Does it work?

If a file with the same name exists it will simply be overwritten. However, if that file is open, locked or is set as read-only it will fail and an error message will appear.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 02, 2017 Aug 02, 2017

this.saveAs("/O/1_invoice staging/" + "105119 ART INV 8-02" + ".pdf")

does not work in console.

Target folder does not have original files that are being batched or any other files.  Right now I can not get the console to repeat the test --  this.saveAs("/O/1_invoice staging/" + "test" + ".pdf")  -- which worked yesterday.  I am beginning to think console mode is unstable or I am not doing something right.

CORRECTION TO THE ABOVE:  both examples of script ran and saved as expected.  My console was locked up, closed and reopened Acrobat and now these commands work fine.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 02, 2017 Aug 02, 2017

I think I am on to something:

When I run this in console it saves just fine:

var pronmbr = 105882

var date_replace = "8-01";

var filename =  pronmbr + " ART INV " + date_replace + ".pdf";

console.println(filename);

app.alert(filename, 3);

this.saveAs("/O/1_invoice staging/" + filename)

when I run this in console app.alert shows the right filename but it throws the error: (does the getPageNthWord command hold onto the document in such a way as to make the saveas think protected or read-only?)

var pronmbr = getPageNthWord(0,13,false)

var date_replace = "8-01";

var filename =  pronmbr + " ART INV " + date_replace + ".pdf";

console.println(filename);

app.alert(filename, 3);

this.saveAs("/O/1_invoice staging/" + filename)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 02, 2017 Aug 02, 2017

Why are you specifying the last third parameter of getPageNthWord as false?

That means it's not stripping any white-space characters from it, which

could mean you're including something like a line-break in the file-name,

which is not allowed.

Try printing out the filename like this:

console.println(filename.toSource());

This will help you find any unwanted characters that might be hiding in

it...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 02, 2017 Aug 02, 2017

yes!!!

(new String("105882 \n ART INV 8-01.pdf"))

got a pesky \n in the filename.  So, change attribute to "true" and this will work?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 02, 2017 Aug 02, 2017

Either that or make sure to remove any such characters from the string before using it in the file-name.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Aug 02, 2017 Aug 02, 2017

Just tested and retested this.  It works perfectly now.  thank you very much for your help!!

true/false in Excel vlookup and other attributes is just the opposite.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Aug 02, 2017 Aug 02, 2017
LATEST

Well, the name of that parameter is bStrip. So if you specify it as true the white-space characters are stripped. If you specify it as false, they are retained... This is all documented in the Acrobat JS API Reference. Anyway, glad to hear you were able to sort it out!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines