Copy link to clipboard
Copied
I'm not a Javascript nor Java programmer so I might be missing one or more steps.
Looking at the Javascript info I have I see the following code:
var r = new Report();
r.writeText(this.metadata);
r.open("myMetadataReportFile");
save("/c/myreport.pdf"));
The code doesn't seem to be working when run from the console. If I execute "this.metadata" I get the information that I expect. This suggests that the problem is with report creation and or saving the document.
I haven't yet figured out how to get information out of the Doc Info dictionary. This is another need.
NOTE: In both cases (XMP and DocInfo) we're adding CUSTOM metadata.
Ideally I'd like to save both sets of information XMP and DocInfo as XML . This way we can run a comparison between the two.
Finally whatever code I end-up with needs to be able to run in the Action Wizard over about 10,000 files. If the input file is "file.pdf" the output should be "file.xml"
Thanks.
Ira
To run multiple lines in the console you need to select them all with the
mouse and then press Ctrl+Enter.
Later on you can place the code as a part of an Action and run it like
that, yes.
The "metadata" property should return the full XMP file, including any
custom properties.
Copy link to clipboard
Copied
Let's start from the end: You will not be able to run an Action on 10,000 files in a single go. If that's your goal then you should abandon it now and look for an alternative to Acrobat, as it simply can't handle that many files without hanging or crashing.
Processing 500 files should be taken as the maximum amount possible.
Copy link to clipboard
Copied
Thanks I didn't realize that Acrobat had such a limit. But even if we have to do this 200 files at a time it is worthwhile doing.
How do we accomplish that?
Copy link to clipboard
Copied
OK, second issue: If you want the output to be an XML file, why are you using the Report object? The Report is a PDF file, you know...
Copy link to clipboard
Copied
No. I didn't know that. As I mentioned earlier, I took the code from an Acrobat Javascript manual that I have.
I hope I'm clear about what we're trying to accomplish.
Copy link to clipboard
Copied
I think I understand, but it's not that simple. It might be possible with the Report object if you saved it as an XML file after opening it, but the result might look a bit strange. So let's go back to your code. First of all, you have a syntax error in the last line, as there are two closing parentheses but only one opening one. So you need to fix that.
Beyond that I see two other issues:
1. It doesn't make sense to use both the open command and the save command. If you want to just save the report then use only save. If you want to view it, use open.
2. You can't save files to the root folder of a drive, it's considered unsafe. Change the path to somewhere else, like C:\Temp\ or C:\Reports\ or something like that.
Fix those issues and try again.
Copy link to clipboard
Copied
I'm trying to run the script in the console.
When I type:
var Rep = new Report();
The console responds:
undefined
Not sure what's going on.
Copy link to clipboard
Copied
What did you expect to happen? This just means that the code completed running without errors or return values.
Copy link to clipboard
Copied
I would have expected not to get "undefined".
When I ran:
var r = new Report();
r.writeText(this.metadata);
r.open("myMetadataReportFile");
And the console responded with:
GeneralError: Operation failed.
Report.open:1:Console undefined:Exec
undefined
It did NOT OPEN a file.
Copy link to clipboard
Copied
I have a feeling you did not select all of the code when you run it,
because the open command is not in the first line of your code... So you
probably only executed the last line, which would have failed. "undefined"
in this case means the script ended, and you have the error message before
that, which caused it to stop running.
Copy link to clipboard
Copied
OK. I figure out what my problem was. I need to press ctrl-enter on each line.
So how do I take the code and make it run on a batch of files? Is it as simple as putting the code in the Action Wizard?
If yes, how do I access all the information in the DocInfo Dictionary (both normal and custom)?
Thanks
Copy link to clipboard
Copied
To run multiple lines in the console you need to select them all with the
mouse and then press Ctrl+Enter.
Later on you can place the code as a part of an Action and run it like
that, yes.
The "metadata" property should return the full XMP file, including any
custom properties.
Copy link to clipboard
Copied
Feel a bit silly that the solution was that simple. Thanks for your patience.
XMP data is half the battle.
I need to do the same kind of data extraction using the information in the DocInfo dictionary. I didn't really see anything in the Javascript documentation to get to this info. Any recommendations/suggestions?
Copy link to clipboard
Copied
Do you mean the values under the info property of the Document object? If so, you can access them like this:
this.info.Title
this.info.Author
this.info.Subject
etc.
Copy link to clipboard
Copied
If I'm understanding what I need to do. If I wanted to add say ISBN and DOI I would use
this.info.ISBN
this.info.DOI
Is this right?
Thanks again.
Copy link to clipboard
Copied
If those properties were defined for that document, yes.
Copy link to clipboard
Copied
Is there a way to get a list of ALL the info properties (i.e. both standard and custom)?
Copy link to clipboard
Copied
Sure:
for (var i in this.info)
console.println(i + ": "+ this.info);
Copy link to clipboard
Copied
Thanks.
This has been a really helpful discussion!!!
Copy link to clipboard
Copied
Almost got it
The following code almost works at least for XMP:
//Step 1
var name = (this.documentFileName);
name=name.replace (".pdf", "");
var r = new Report();
r.writeText (this.metadata);
r.save ("/c/Users/ipolans/Desktop/PDF Metadata/XMP-Data/" + name + "-XML.pdf");
//Step 2
app.openDoc("/c/Users/ipolans/Desktop/PDF Metadata/XMP-Data/" + name + "-XML.pdf");
console.println("the current document is "+ this.documentFileName);
var fil = (name + "-XML");
saveAs ("/c/Users/ipolans/Desktop/PDF Metadata/XMP-Data/" + fil + ".txt", "com.adobe.acrobat.plain-text");
The main problem with the code above is that I haven't been able to figure out how to get "saveAs" to use the document opened with "app.openDoc". Instead it is using the document processed in "Step 1". This is verified by the "console.println" statement
Even if this issue is fixed according to the JavaScript documentation "app.OpenDoc" is not allowed in a batch file (which I assume includes the "Action Wizard"). Which touches on the issue of how much of the code will need modification to work in the "Action Wizard"
Ideally I'd like the JavaScript to avoid having to create a temporary PDF file. Rather I'd want to (1) query the PDF for the XMP metadata and then (2) write directly to a "text" file.
Copy link to clipboard
Copied
Instead of using save and then open and then re-save, just use the open command of the Report object to generate a new Document object. Then use saveAs to convert it to a text file. Something like this:
//Step 1
var name = (this.documentFileName);
name=name.replace (".pdf", "");
var r = new Report();
r.writeText(this.metadata);
var newDoc = r.open("XMP Report");
//Step 2
var fil = (name + "-XML");
newDoc.saveAs("/c/Users/ipolans/Desktop/PDF Metadata/XMP-Data/" + fil + ".txt", "com.adobe.acrobat.plain-text");
newDoc.closeDoc(true);
Copy link to clipboard
Copied
That works.
But I'm finding that "r.writeText" doesn't always produce a new line at the end.
Here's an example showing the problem:
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 81.157285, 2014/12/12-00:43:15 "> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:pdf="http://ns.adobe.com/pdf/1.3/" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:pdfx="http://ns.adobe.com/pdfx/1.3/" dc:format="application/pdf" pdf:Producer="Adobe PDF Library 4.0; modified using iText 2.1.7 by 1T3XT" pdf:keywords="" pdf:Keywords="" xmp:CreateDate="2004-05-12T07:31:12+05:30" xmp:ModifyDate="2016-03-22T08:29:47-04:00" xmp:CreatorTool="Acrobat Capture 3.0" pdfx:Article_Title="IEEE Standard for Shunt Power Capacitors" pdfx:DOI="10.1109/IEEESTD.1980.79668" pdfx:DOI_Link="https://dx.doi.org/10.1109/IEEESTD.1980.79668" pdfx:IEEE_Publication_No.="2459" pdfx:IEEE_Xplore_Article_No.="26642" pdfx:Page_Numbers="1 - 23" pdfx:Publication_Title="ANSI/IEEE Std 18-1980" pdfx:Style="Searchable Image (Exact)"> <dc:description> <rdf:Alt>
I've even tried "r.writeText (" "); without any luck.
Copy link to clipboard
Copied
It should do, but maybe the line-breaks disappear when the file is converted to a text file.
What application are you using to view the text file in? If you're using something like Notepad++ check if there's a CR and an LF char at the end of each line. Maybe there's just a CR, which some applications (like the regular Notepad) do not pick up as a line-break, if I recall correctly.
Copy link to clipboard
Copied
I opened the file in Word. Don't have anything handy ton my PC hat shows the hex representation.
What I found is that some of the lines have spaces at the end and others have cr/lf pairs (at least as far as Word is concerned).
I'll do some more investigating tomorrow.
Copy link to clipboard
Copied
It might be because you're printing out the entire metadata, which includes line-breaks already, and that might not come through when you use writeText. In that case I would recommend splitting the metadata string to individual lines and then printing each one of those lines to the report on its own.
And I highly recommend Notepad++ for both writing code and examining plain-text files.