Copy link to clipboard
Copied
I need to parse the meta-data of a given PDF file to get counters of different types of objects contained in a pdf and extract the various object. Say object of type "/JavaScript" or "/ObjStm".
Copy link to clipboard
Copied
I am trying to do with PDF Library SDK for C++.
Any leads would be really helpful.
Thanks in advance!
Copy link to clipboard
Copied
This type of objects are not a part of a file's metadata, but the actual data...
Copy link to clipboard
Copied
Yeah, true they should be called structural data of PDF.
What I am trying to do is to extract all the structural objects, based on their type, and categorize those, Basically maintaining a counter of objects in each category.
But couldn't find the right set of APIs or not even sure does the SDK enables us with any such kind of functionality.
Copy link to clipboard
Copied
The Cos API gives access to all objects. But not to objstm.
Copy link to clipboard
Copied
Yeah, that's one of the cases. And I need to maintain a counter of all kinds of objects even "/JS" and all "/AA". So I need some sort of parser or enumerator.
Copy link to clipboard
Copied
The Cos layer is what you get. It can enumerate all actual objects. If this isn't enough for you, Adobe don't have anything else, but there are many PDF libraries out there.
Copy link to clipboard
Copied
Thanks!
Wanted to check is there anything other than the COS layer that can help( I may not aware of it).
Or if Acrobat SDK has some added functionality for this as compared to PDFL SDK.
I tried using open source libs, those are good but give some internal logic/ number error for a few malicious pdfs. So thought this is the most reliable one to go with.