Copy link to clipboard
Copied
Trying to find a way to access the XMP Metadata using Excel VBA with a reference to Adobe Acrobat 10.0 Type Library.
Found some comparable documentation within:
Adobe Acrobat SDK - JS for Acrobat API Reference:
Metadata topic (page 248 of 2021-02-01 ed).
for JavaScript.
Anyway to do this via excel VBA?
GetInfo and SetInfo work very well for the basic Metadata (Creator, Producer, Author, Title, etc.). Is there a way to access the XMP Metadata -- specifically the copyright?
Thanks!
Copy link to clipboard
Copied
HI,
Not sure how you would do it in Excel, but the metadata in most PDFs is stored as plain text, so if you can read the file as standard text and get the information you would like :
1 0 obj
<</Length 3546/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 7.0-c000 79.1357c9e, 2021/07/14-00:39:56 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/">
<xmp:ModifyDate>2021-11-25T11:43:29Z</xmp:ModifyDate>
<xmp:CreateDate>2021-09-07T20:52:16+01:00</xmp:CreateDate>
<xmp:MetadataDate>2021-11-25T11:43:29Z</xmp:MetadataDate>
<xmp:CreatorTool>Adobe Acrobat 21.5</xmp:CreatorTool>
<dc:format>application/pdf</dc:format>
<dc:rights>
<rdf:Alt>
<rdf:li xml:lang="x-default">Some text here</rdf:li>
</rdf:Alt>
</dc:rights>
<xmpMM:DocumentID>uuid:0f59de51-a56d-8543-960c-877e9d88a112</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:e5b4a1e7-e85d-2f45-aa24-5303d10abc5d</xmpMM:InstanceID>
<pdf:Producer>Adobe Acrobat 21.5 Image Conversion Plug-in</pdf:Producer>
<pdf:Keywords/>
<xmpRights:Marked>True</xmpRights:Marked>
<xmpRights:WebStatement>someinfohere</xmpRights:WebStatement>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
The above is taken straight from a PDF file, the tags you are looking for are
<dc:right>
<xmpRights:*>
Note: some PDF's might have more than one xmp metadata entries.
Copy link to clipboard
Copied
PNW, did you ever find a way to do this? I know its been done, seen extractions in other apps, but no one seems to know how its done. It has to be a dictionary or object somewhere.
Copy link to clipboard
Copied
I tried a few things, and ended up with a mash up that got us through. It's not pretty, so YMMV.
Dim aApp As Acrobat.AcroApp
Dim AV_Doc As Acrobat.AcroAVDoc
Dim PD_Doc As Acrobat.AcroPDDoc
Dim aFlags As Acrobat.PDDocFlags
Dim a As Acrobat.AcroPDAnnot
Dim oXML As MSXML2.DOMDocument60
Dim XML_NList As MSXML2.IXMLDOMNodeList
'-- Core Code
On Error Resume Next
For Each y In XML_NList
Set XML_N(0) = y
sParents = ""
Do While Not XML_N(0).ParentNode Is Nothing
sParents = XML_N(0).ParentNode.nodeName & ";" & sParents
Set XML_N(0) = XML_N(0).ParentNode
Loop
With y
oShtOut.Cells(l, 1) = sParents
oShtOut.Cells(l, 2) = .Name
oShtOut.Cells(l, 3) = .nodeName
oShtOut.Cells(l, 4) = .NodeType
oShtOut.Cells(l, 5) = .nodeTypeString
oShtOut.Cells(l, 6) = .NodeValue
oShtOut.Cells(l, 7) = .tagName
oShtOut.Cells(l, 8) = .Text
oShtOut.Cells(l, 9) = .Value
oShtOut.Cells(l, 10) = .XML
oShtOut.Cells(l, 11) = .ChildNodes.Length
End With
oShtOut.Rows(l).EntireRow.WordWrap = False
l = l + 1
Next
On Error GoTo 0
exiftool -xmp-dc:Rights="DC Rights" -xmp-xmprights:Marked="True" -xmp-xmprights:Owner="xmprights:Owner" -xmp-xmprights:UsageTerms="xmprights:UsageTerms" -xmp-xmprights:WebStatement="xmprights:WebStatement" 1.pdf
Copy link to clipboard
Copied
VBA can use the VB:JavaScript connector, and directly call JavaScript functions.
Copy link to clipboard
Copied
I had trouble finding good documentation on the VBA:JS implimentation.
Here is some of what I found:
In general, good VBA: JS Object reference documentation is thin at best. Am I missing it somewhere?
Any suggestions on the best VBA: JS Object documentation? -- Thanks!!
Copy link to clipboard
Copied
The key thing is that the interface is to an object called JSObject. Using OLE — Acrobat-PDFL SDK: Interapplication Communication (adobe.com)
Find more inspiration, events, and resources on the new Adobe Community
Explore Now