• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers

Need to parse the pdf to get all object from meta-data.

Community Beginner ,
Mar 01, 2021 Mar 01, 2021

Copy link to clipboard

Copied

I need to parse the meta-data of a given PDF file to get counters of different types of objects contained in a pdf and extract the various object. Say object of type "/JavaScript" or "/ObjStm".

TOPICS
Windows

Views

107

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 01, 2021 Mar 01, 2021

Copy link to clipboard

Copied

I am trying to do with PDF Library SDK for C++.


Any leads would be really helpful.


Thanks in advance!

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 01, 2021 Mar 01, 2021

Copy link to clipboard

Copied

This type of objects are not a part of a file's metadata, but the actual data...

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 01, 2021 Mar 01, 2021

Copy link to clipboard

Copied

Yeah, true they should be called structural data of PDF.
What I am trying to do is to extract all the structural objects, based on their type, and categorize those, Basically maintaining a counter of objects in each category.

But couldn't find the right set of APIs or not even sure does the SDK enables us with any such kind of functionality.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 01, 2021 Mar 01, 2021

Copy link to clipboard

Copied

The Cos API gives access to all objects. But not to objstm. 

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 01, 2021 Mar 01, 2021

Copy link to clipboard

Copied

Yeah, that's one of the cases.  And I need to maintain a counter of all kinds of objects even "/JS"  and all "/AA". So I need some sort of parser or enumerator.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Mar 02, 2021 Mar 02, 2021

Copy link to clipboard

Copied

The Cos layer is what you get. It can enumerate all actual objects. If this isn't enough for you, Adobe don't have anything else, but there are many PDF libraries out there.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 02, 2021 Mar 02, 2021

Copy link to clipboard

Copied

LATEST

Thanks!
Wanted to check is there anything other than the COS layer that can help( I may not aware of it).
Or if   Acrobat SDK has some added functionality for this as compared to PDFL SDK.

I tried using open source libs, those are good but give some internal logic/ number error for a few malicious pdfs. So thought this is the most reliable one to go with.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines