Programatic access to PDF Meta Data, including XMP Metadata, using Excel VBA.
Copy link to clipboard
Copied
Trying to find a way to access the XMP Metadata using Excel VBA with a reference to Adobe Acrobat 10.0 Type Library.
Found some comparable documentation within:
Adobe Acrobat SDK - JS for Acrobat API Reference:
Metadata topic (page 248 of 2021-02-01 ed).
for JavaScript.
Anyway to do this via excel VBA?
GetInfo and SetInfo work very well for the basic Metadata (Creator, Producer, Author, Title, etc.). Is there a way to access the XMP Metadata -- specifically the copyright?
Thanks!
Copy link to clipboard
Copied
HI,
Not sure how you would do it in Excel, but the metadata in most PDFs is stored as plain text, so if you can read the file as standard text and get the information you would like :
1 0 obj
<</Length 3546/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 7.0-c000 79.1357c9e, 2021/07/14-00:39:56 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/">
<xmp:ModifyDate>2021-11-25T11:43:29Z</xmp:ModifyDate>
<xmp:CreateDate>2021-09-07T20:52:16+01:00</xmp:CreateDate>
<xmp:MetadataDate>2021-11-25T11:43:29Z</xmp:MetadataDate>
<xmp:CreatorTool>Adobe Acrobat 21.5</xmp:CreatorTool>
<dc:format>application/pdf</dc:format>
<dc:rights>
<rdf:Alt>
<rdf:li xml:lang="x-default">Some text here</rdf:li>
</rdf:Alt>
</dc:rights>
<xmpMM:DocumentID>uuid:0f59de51-a56d-8543-960c-877e9d88a112</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:e5b4a1e7-e85d-2f45-aa24-5303d10abc5d</xmpMM:InstanceID>
<pdf:Producer>Adobe Acrobat 21.5 Image Conversion Plug-in</pdf:Producer>
<pdf:Keywords/>
<xmpRights:Marked>True</xmpRights:Marked>
<xmpRights:WebStatement>someinfohere</xmpRights:WebStatement>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
The above is taken straight from a PDF file, the tags you are looking for are
<dc:right>
<xmpRights:*>
Note: some PDF's might have more than one xmp metadata entries.
Copy link to clipboard
Copied
PNW, did you ever find a way to do this? I know its been done, seen extractions in other apps, but no one seems to know how its done. It has to be a dictionary or object somewhere.
Copy link to clipboard
Copied
I tried a few things, and ended up with a mash up that got us through. It's not pretty, so YMMV.
- We got an older version of Adobe Acrobat Pro and used a reference to the "Adobe Acrobat 10.0 Type Library" for part of it.
- Based on the very helpful post by @BarlaeDC , I used the LineInput statement to scan the PDF.
- This does a nice job of reading them in.
- I then tried using MSXML2 as it is a bit more refined uses DOM -- Critical code components below.
Dim aApp As Acrobat.AcroApp
Dim AV_Doc As Acrobat.AcroAVDoc
Dim PD_Doc As Acrobat.AcroPDDoc
Dim aFlags As Acrobat.PDDocFlags
Dim a As Acrobat.AcroPDAnnot
Dim oXML As MSXML2.DOMDocument60
Dim XML_NList As MSXML2.IXMLDOMNodeList
'-- Core Code
On Error Resume Next
For Each y In XML_NList
Set XML_N(0) = y
sParents = ""
Do While Not XML_N(0).ParentNode Is Nothing
sParents = XML_N(0).ParentNode.nodeName & ";" & sParents
Set XML_N(0) = XML_N(0).ParentNode
Loop
With y
oShtOut.Cells(l, 1) = sParents
oShtOut.Cells(l, 2) = .Name
oShtOut.Cells(l, 3) = .nodeName
oShtOut.Cells(l, 4) = .NodeType
oShtOut.Cells(l, 5) = .nodeTypeString
oShtOut.Cells(l, 6) = .NodeValue
oShtOut.Cells(l, 7) = .tagName
oShtOut.Cells(l, 8) = .Text
oShtOut.Cells(l, 9) = .Value
oShtOut.Cells(l, 10) = .XML
oShtOut.Cells(l, 11) = .ChildNodes.Length
End With
oShtOut.Rows(l).EntireRow.WordWrap = False
l = l + 1
Next
On Error GoTo 0
- For bulk update of the PDF files, I ended up using a VBA command line call to exiftool with the following structure:
exiftool -xmp-dc:Rights="DC Rights" -xmp-xmprights:Marked="True" -xmp-xmprights:Owner="xmprights:Owner" -xmp-xmprights:UsageTerms="xmprights:UsageTerms" -xmp-xmprights:WebStatement="xmprights:WebStatement" 1.pdf
Copy link to clipboard
Copied
VBA can use the VB:JavaScript connector, and directly call JavaScript functions.
Copy link to clipboard
Copied
I had trouble finding good documentation on the VBA:JS implimentation.
Here is some of what I found:
- https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/
- https://www.mrexcel.com/board/threads/javascript-vba-bridge-js-object-functions-listing.887860/
- http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/
In general, good VBA: JS Object reference documentation is thin at best. Am I missing it somewhere?
Any suggestions on the best VBA: JS Object documentation? -- Thanks!!
Copy link to clipboard
Copied
The key thing is that the interface is to an object called JSObject. Using OLE — Acrobat-PDFL SDK: Interapplication Communication (adobe.com)

