Skip to main content
Participant
May 11, 2021
Question

how adobe display content tree on the tagged flattened pdf?

  • May 11, 2021
  • 1 reply
  • 2740 views

I have a tagged PDF form, on which I have filled the form fields and flattened the pdf. After flattening when I look at the content tree - it displays annotations, containers and form Xobjects at the end. Basically, I need to know how adobe is able to read this information from the pdf that was already flattened. I am asking about internal structure details(which dictionary it is looking to get this information).

 

for my project in order to fix the accessibility issue in the browser, I am trying to fix the content tree( wanted to move form object next to each form field instead of keeping it at the end) through code.

 

Will appreciate any help. thank you!

This topic has been closed for replies.

1 reply

Karl Heinz  Kremer
Community Expert
Community Expert
May 11, 2021

All this infornation in in the PDF specification. However, when you flatten a document, there should no longer be annotation data in the document. It may still be in the file if you did not do a "Save As" to remove all incremental updates, but even if it's still listed, it will not be used to render the file. Do a "Save As" and see if you can still find the annotations - they should be gone.

simbu14Author
Participant
May 11, 2021

Thank you for the Response. Appreciate your quick turnaround.

 

I have attached a screenshot of the original file, expected file and actual file.

After I flatten the original file, I get the output as in "flatten_actual_output.png". But I need output as in "flatten_expected_output.png". Just to show you what am expecting I have dragged the form Xobject and dropped it into respective spots. Please let me know how can I achieve this automatically?

Please let me know if you need more explanation on my expectation. sorry if I am not describing the problem correctly! Thank you.

Karl Heinz  Kremer
Community Expert
Community Expert
May 11, 2021

You are trying to do something that Acrobat will not do. When you flatten a document, all bets are off regarding of how the informaiton is represented within the PDF document. As long as the visual representation is the same and all other features work as advertised (e.g. the tagged structure of a document), Acrobat can do whatever it wants in regards to how the file is saved. You could potentially draw all capital A glyphs in the document first, followed by all "B", and so on, continuing with the lower case letters, the numbers and other characters. This would still be a valid PDF file. Extracting informaiton from it would be much harder, but it would still be valid. This is a case of "it is what it is", and there is nothing you can do to change Acrobat's flattening behavior. 

 

To give you some more information about what's going on: When you compare the contents of the "containers" you see in your "before" and "actual" documents, you will very likely see that they are identical. There actually is no form field in these containers, they are stored in the "Annotations" hierarchy. This means that based on the content, there is no relationship between the static text and the form field. What happens when you flatten this document, is that Acrobat adds the flattened form fields after all the other elements on that page - and that is what you see in your "actual" screenshot.