Copy link to clipboard
Copied
I am trying to extract data from Layers within a PDF drawing into some usable format, JSON, XML etc. I am working on CAD exports of drawings in PDF format. I am able to interrogate the various Layers within PDF XChange Editor, each of which relates to a Reference within the CAD model (Building Outline, Fence, Hedge etc.). I want to be able to extract all of the metadata from each of the Layers and display this in JSON/XML format (Fill Color, Opacity, Stroke Color, Stroke Opacity, Border Width etc.). Please can you let me know if this is possible with the Adobe PDF Extract API or any other API / services? Thanks.
The current version of Extract doesn't provide any information about layers at all and we only provide the properties of vectors when they are used to represent table cells.
You'll need a PDF library tool to do this and even then, you'll need to know a lot about the PDF drawing instructions to get the information you need. It's a non-trivial task.
For anyone grappling with the same issue there is a very informative post on stack overflow linked below. I would also recommend looking into the python library PyMuPDF.
python - Extract Geometry Elements from PDF by OCG (by Layer) - Stack Overflow
Copy link to clipboard
Copied
Have you tried testing it here - https://documentservices.adobe.com/dc-visualizer-app/index.html ?
Copy link to clipboard
Copied
Thanks for your response, this tool does not provide the required granularity however.
Copy link to clipboard
Copied
The current version of Extract doesn't provide any information about layers at all and we only provide the properties of vectors when they are used to represent table cells.
You'll need a PDF library tool to do this and even then, you'll need to know a lot about the PDF drawing instructions to get the information you need. It's a non-trivial task.
Copy link to clipboard
Copied
Thanks for your response.
Copy link to clipboard
Copied
For anyone grappling with the same issue there is a very informative post on stack overflow linked below. I would also recommend looking into the python library PyMuPDF.
python - Extract Geometry Elements from PDF by OCG (by Layer) - Stack Overflow