• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Want to extract document metadata and doc info via a script

New Here ,
Apr 25, 2016 Apr 25, 2016

Copy link to clipboard

Copied

I'm not a Javascript nor Java programmer so I might be missing one or more steps.

Looking at the Javascript info I have I see the following code:

var r = new Report();

r.writeText(this.metadata);

r.open("myMetadataReportFile");

save("/c/myreport.pdf"));

The code doesn't seem to be working when run from the console. If I execute "this.metadata" I get the information that I expect. This suggests that the problem is with report creation and or saving the document.

I haven't yet figured out how to get information out of the Doc Info dictionary.  This is another need.

NOTE: In both cases (XMP and DocInfo) we're adding CUSTOM metadata.

Ideally I'd like to save both sets of information XMP and DocInfo as XML . This way we can run a comparison between the two.

Finally whatever code I end-up with needs to be able to run in the Action Wizard over about 10,000 files. If the input file is "file.pdf" the output should be "file.xml"

Thanks.

Ira

TOPICS
Acrobat SDK and JavaScript , Windows

Views

4.1K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Apr 26, 2016 Apr 26, 2016

To run multiple lines in the console you need to select them all with the

mouse and then press Ctrl+Enter.

Later on you can place the code as a part of an Action and run it like

that, yes.

The "metadata" property should return the full XMP file, including any

custom properties.

Votes

Translate

Translate
Community Expert ,
Apr 25, 2016 Apr 25, 2016

Copy link to clipboard

Copied

Let's start from the end: You will not be able to run an Action on 10,000 files in a single go. If that's your goal then you should abandon it now and look for an alternative to Acrobat, as it simply can't handle that many files without hanging or crashing.

Processing 500 files should be taken as the maximum amount possible.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 25, 2016 Apr 25, 2016

Copy link to clipboard

Copied

Thanks I didn't realize that Acrobat had such a limit. But even if we have to do this 200 files at a time it is worthwhile doing.

How do we accomplish that?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 25, 2016 Apr 25, 2016

Copy link to clipboard

Copied

OK, second issue: If you want the output to be an XML file, why are you using the Report object? The Report is a PDF file, you know...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 25, 2016 Apr 25, 2016

Copy link to clipboard

Copied

No. I didn't know that. As I mentioned earlier, I took the code from an Acrobat Javascript manual that I have.

I hope I'm clear about what we're trying to accomplish.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 25, 2016 Apr 25, 2016

Copy link to clipboard

Copied

I think I understand, but it's not that simple. It might be possible with the Report object if you saved it as an XML file after opening it, but the result might look a bit strange. So let's go back to your code. First of all, you have a syntax error in the last line, as there are two closing parentheses but only one opening one. So you need to fix that.

Beyond that I see two other issues:

1. It doesn't make sense to use both the open command and the save command. If you want to just save the report then use only save. If you want to view it, use open.

2. You can't save files to the root folder of a drive, it's considered unsafe. Change the path to somewhere else, like C:\Temp\ or C:\Reports\ or something like that.

Fix those issues and try again.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

I'm trying to run the script in the console.

When I type:

var Rep = new Report();

The console responds:

undefined

Not sure what's going on.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

What did you expect to happen? This just means that the code completed running without errors or return values.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

I would have expected not to get "undefined".

When I ran:

var r = new Report();

r.writeText(this.metadata);

r.open("myMetadataReportFile");

And the console responded with:

GeneralError: Operation failed.

Report.open:1:Console undefined:Exec

undefined

It did NOT OPEN a file.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

I have a feeling you did not select all of the code when you run it,

because the open command is not in the first line of your code... So you

probably only executed the last line, which would have failed. "undefined"

in this case means the script ended, and you have the error message before

that, which caused it to stop running.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

OK. I figure out what my problem was. I need to press ctrl-enter on each line.

So how do I take the code and make it run on a batch of files? Is it as simple as putting the code in the Action Wizard?

If yes, how do I access all the information in the DocInfo Dictionary (both normal and custom)?

Thanks

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

To run multiple lines in the console you need to select them all with the

mouse and then press Ctrl+Enter.

Later on you can place the code as a part of an Action and run it like

that, yes.

The "metadata" property should return the full XMP file, including any

custom properties.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

Feel a bit silly that the solution was that simple. Thanks for your patience.

XMP data is half the battle.

I need to do the same kind of data extraction using the information in the DocInfo dictionary. I didn't really see anything in the Javascript documentation to get to this info. Any  recommendations/suggestions?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

Do you mean the values under the info property of the Document object? If so, you can access them like this:

this.info.Title

this.info.Author

this.info.Subject

etc.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

If I'm understanding what I need to do. If I wanted to add say ISBN and DOI I would use

this.info.ISBN

this.info.DOI

Is this right?

Thanks again.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 26, 2016 Apr 26, 2016

Copy link to clipboard

Copied

If those properties were defined for that document, yes.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 27, 2016 Apr 27, 2016

Copy link to clipboard

Copied

Is there a way to get a list of ALL the info properties (i.e. both standard and custom)?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 27, 2016 Apr 27, 2016

Copy link to clipboard

Copied

Sure:

for (var i in this.info)

     console.println(i + ": "+ this.info);

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 27, 2016 Apr 27, 2016

Copy link to clipboard

Copied

Thanks.

This has been a really helpful discussion!!!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 27, 2016 Apr 27, 2016

Copy link to clipboard

Copied

Almost got it

The following code almost works at least for XMP:

//Step 1

var name = (this.documentFileName);

name=name.replace (".pdf", "");

var r = new Report();

r.writeText (this.metadata);

r.save ("/c/Users/ipolans/Desktop/PDF Metadata/XMP-Data/" + name + "-XML.pdf");

//Step 2

app.openDoc("/c/Users/ipolans/Desktop/PDF Metadata/XMP-Data/" + name + "-XML.pdf");

console.println("the current document is "+ this.documentFileName);

var fil = (name + "-XML");

saveAs ("/c/Users/ipolans/Desktop/PDF Metadata/XMP-Data/" + fil + ".txt", "com.adobe.acrobat.plain-text");

The main problem with the code above is that I haven't been able to figure out how to get "saveAs" to use the document opened with "app.openDoc". Instead it is using the document processed in "Step 1". This is verified by the "console.println" statement

Even if this issue is fixed according to the JavaScript documentation "app.OpenDoc" is not allowed in a batch file (which I assume includes the "Action Wizard"). Which touches on the issue of how much of the code will need modification to work in the "Action Wizard"

Ideally I'd like the JavaScript to avoid having to create a temporary PDF file. Rather I'd want to (1) query the PDF for the XMP metadata and then (2) write directly to a "text" file.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 27, 2016 Apr 27, 2016

Copy link to clipboard

Copied

Instead of using save and then open and then re-save, just use the open command of the Report object to generate a new Document object. Then use saveAs to convert it to a text file. Something like this:

//Step 1

var name = (this.documentFileName);

name=name.replace (".pdf", "");

var r = new Report();

r.writeText(this.metadata);

var newDoc = r.open("XMP Report");

//Step 2

var fil = (name + "-XML");

newDoc.saveAs("/c/Users/ipolans/Desktop/PDF Metadata/XMP-Data/" + fil + ".txt", "com.adobe.acrobat.plain-text");

newDoc.closeDoc(true);

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 03, 2016 May 03, 2016

Copy link to clipboard

Copied

That works.

But I'm finding that "r.writeText" doesn't always produce a new line at the end.

Here's an example showing the problem:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 81.157285, 2014/12/12-00:43:15   ">  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pdf="http://ns.adobe.com/pdf/1.3/" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/" dc:format="application/pdf" pdf:Producer="Adobe PDF Library 4.0; modified using iText 2.1.7 by 1T3XT" pdf:keywords="" pdf:Keywords="" xmp:CreateDate="2004-05-12T07:31:12+05:30" xmp:ModifyDate="2016-03-22T08:29:47-04:00" xmp:CreatorTool="Acrobat Capture 3.0" pdfx:Article_Title="IEEE Standard for Shunt Power Capacitors" pdfx:DOI="10.1109/IEEESTD.1980.79668" pdfx:DOI_Link="https://dx.doi.org/10.1109/IEEESTD.1980.79668" pdfx:IEEE_Publication_No.="2459" pdfx:IEEE_Xplore_Article_No.="26642" pdfx:Page_Numbers="1 - 23" pdfx:Publication_Title="ANSI/IEEE Std 18-1980" pdfx:Style="Searchable Image (Exact)"> <dc:description> <rdf:Alt>

I've even tried "r.writeText (" "); without any luck.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 03, 2016 May 03, 2016

Copy link to clipboard

Copied

It should do, but maybe the line-breaks disappear when the file is converted to a text file.

What application are you using to view the text file in? If you're using something like Notepad++ check if there's a CR and an LF char at the end of each line. Maybe there's just a CR, which some applications (like the regular Notepad) do not pick up as a line-break, if I recall correctly.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
May 03, 2016 May 03, 2016

Copy link to clipboard

Copied

I opened the file in Word. Don't have anything handy ton my PC hat shows the hex representation.

What I found is that some of the lines have spaces at the end and others have cr/lf pairs (at least as far as Word is concerned).

I'll do some more investigating tomorrow.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 03, 2016 May 03, 2016

Copy link to clipboard

Copied

It might be because you're printing out the entire metadata, which includes line-breaks already, and that might not come through when you use writeText. In that case I would recommend splitting the metadata string to individual lines and then printing each one of those lines to the report on its own.

And I highly recommend Notepad++ for both writing code and examining plain-text files.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines