Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Programatic access to PDF Meta Data, including XMP Metadata, using Excel VBA.

New Here ,
Nov 19, 2021 Nov 19, 2021

Trying to find a way to access the XMP Metadata using Excel VBA with a reference to Adobe Acrobat 10.0 Type Library.

 

Found some comparable documentation within:

Adobe Acrobat SDK - JS for Acrobat API Reference:

Metadata topic (page 248 of 2021-02-01 ed).

for JavaScript.

 

Anyway to do this via excel VBA?

GetInfo and SetInfo work very well for the basic Metadata (Creator, Producer, Author, Title, etc.). Is there a way to access the XMP Metadata -- specifically the copyright?

 

Thanks!

 

2.7K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 25, 2021 Nov 25, 2021

HI,

 

Not sure how you would do it in Excel, but the metadata in most PDFs is stored as plain text, so if you can read the file as standard text and get the information you would like :

 

1 0 obj
<</Length 3546/Subtype/XML/Type/Metadata>>stream
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 7.0-c000 79.1357c9e, 2021/07/14-00:39:56        ">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:dc="http://purl.org/dc/elements/1.1/"
            xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
            xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
            xmlns:xmpRights="http://ns.adobe.com/xap/1.0/rights/">
         <xmp:ModifyDate>2021-11-25T11:43:29Z</xmp:ModifyDate>
         <xmp:CreateDate>2021-09-07T20:52:16+01:00</xmp:CreateDate>
         <xmp:MetadataDate>2021-11-25T11:43:29Z</xmp:MetadataDate>
         <xmp:CreatorTool>Adobe Acrobat 21.5</xmp:CreatorTool>
         <dc:format>application/pdf</dc:format>
         <dc:rights>
            <rdf:Alt>
               <rdf:li xml:lang="x-default">Some text here</rdf:li>
            </rdf:Alt>
         </dc:rights>
         <xmpMM:DocumentID>uuid:0f59de51-a56d-8543-960c-877e9d88a112</xmpMM:DocumentID>
         <xmpMM:InstanceID>uuid:e5b4a1e7-e85d-2f45-aa24-5303d10abc5d</xmpMM:InstanceID>
         <pdf:Producer>Adobe Acrobat 21.5 Image Conversion Plug-in</pdf:Producer>
         <pdf:Keywords/>
         <xmpRights:Marked>True</xmpRights:Marked>
         <xmpRights:WebStatement>someinfohere</xmpRights:WebStatement>
      </rdf:Description>
   </rdf:RDF>
</x:xmpmeta>

The above is taken straight from a PDF file, the tags you are looking for are

<dc:right>

<xmpRights:*>

 

Note: some PDF's might have more than one xmp metadata entries.

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 21, 2022 Sep 21, 2022

PNW, did you ever find a way to do this? I know its been done, seen extractions in other apps, but no one seems to know how its done. It has to be a dictionary or object somewhere.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 21, 2022 Sep 21, 2022

I tried a few things, and ended up with a mash up that got us through. It's not pretty, so YMMV.

 

  • We got an older version of Adobe Acrobat Pro and used a reference to the "Adobe Acrobat 10.0 Type Library" for part of it.
  • Based on the very helpful post by @BarlaeDC , I used the LineInput statement to scan the PDF.
    • This does a nice job of reading them in.
    • I then tried using MSXML2 as it is a bit more refined uses DOM -- Critical code components below.
Dim aApp                As Acrobat.AcroApp
Dim AV_Doc              As Acrobat.AcroAVDoc
Dim PD_Doc              As Acrobat.AcroPDDoc
Dim aFlags              As Acrobat.PDDocFlags
Dim a                   As Acrobat.AcroPDAnnot

Dim oXML                As MSXML2.DOMDocument60
Dim XML_NList           As MSXML2.IXMLDOMNodeList

'-- Core Code
    On Error Resume Next
    For Each y In XML_NList
        Set XML_N(0) = y
        sParents = ""
        Do While Not XML_N(0).ParentNode Is Nothing
            sParents = XML_N(0).ParentNode.nodeName & ";" & sParents
            Set XML_N(0) = XML_N(0).ParentNode
        Loop
        With y
            oShtOut.Cells(l, 1) = sParents
            oShtOut.Cells(l, 2) = .Name
            oShtOut.Cells(l, 3) = .nodeName
            oShtOut.Cells(l, 4) = .NodeType
            oShtOut.Cells(l, 5) = .nodeTypeString
            oShtOut.Cells(l, 6) = .NodeValue
            oShtOut.Cells(l, 7) = .tagName
            oShtOut.Cells(l, 8) = .Text
            oShtOut.Cells(l, 9) = .Value
            oShtOut.Cells(l, 10) = .XML
            oShtOut.Cells(l, 11) = .ChildNodes.Length
        End With
        oShtOut.Rows(l).EntireRow.WordWrap = False
        l = l + 1
    Next
    On Error GoTo 0

 

  • For bulk update of the PDF files, I ended up using a VBA command line call to exiftool with the following structure:
exiftool -xmp-dc:Rights="DC Rights" -xmp-xmprights:Marked="True" -xmp-xmprights:Owner="xmprights:Owner" -xmp-xmprights:UsageTerms="xmprights:UsageTerms" -xmp-xmprights:WebStatement="xmprights:WebStatement" 1.pdf

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 21, 2022 Sep 21, 2022

VBA can use the VB:JavaScript connector, and directly call JavaScript functions.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Sep 21, 2022 Sep 21, 2022

@Test Screen Name 

I had trouble finding good documentation on the VBA:JS implimentation.

 

Here is some of what I found:

 

In general, good VBA: JS Object reference documentation is thin at best. Am I missing it somewhere?

 

Any suggestions on the best VBA: JS Object documentation?  -- Thanks!!

 

 

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 22, 2022 Sep 22, 2022
LATEST

The key thing is that the interface is to an object called JSObject. Using OLE — Acrobat-PDFL SDK: Interapplication Communication (adobe.com)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines