Skip to main content
Participant
January 7, 2020
Question

List of Tags created by Acrobat for XML converted file

  • January 7, 2020
  • 1 reply
  • 371 views

I want to read data from the pdf files on the basis of their importance in the document e.g. heading levels, tables, title etc. I'm converting pdf files into XML for getting the tags and their associated property. Name of tags found in converted XML files are different from the general ones e.g. HL1, HL2, ... or <p>, etc . i want the complete list of tags which Acrobat produced and their meaning/property it will help me to map these tags to the standard ones.

 

I look forward to your help and support.

Thanks & Regards

Naveen

This topic has been closed for replies.

1 reply

Bernd Alheit
Community Expert
Community Expert
January 7, 2020

What does you mean with "standard" tags?

Bevi Chagnon - PubCom.com
Legend
January 7, 2020

XML doesn't have a set of standard tags. That's why it's called "eXtensible markup language." You create the tag names and the rules of how they'll be used that's defined in a DTD, schema, or standard/specification.

 

Are you thinking of the set of tags used to create accessible PDFs? If so, that set is defined in the PDF and PDF/UA standards (ISO 32000 chapter 14, and ISO 14289). You can view a quick list of those tags at https://helpx.adobe.com/acrobat/using/editing-document-structure-content-tags.html#standard_pdf_tags  They are similar to HTML tags, but differ in very significant ways. FYI, these tags in a PDF are not XML but nothing prevents you from creating an XML tag set that matches them.

 

|&nbsp;&nbsp;&nbsp;&nbsp;Bevi Chagnon &nbsp;&nbsp;|&nbsp;&nbsp;Designer, Trainer, &amp; Technologist for Accessible Documents ||&nbsp;&nbsp;&nbsp;&nbsp;PubCom |&nbsp;&nbsp;&nbsp;&nbsp;Classes &amp; Books for Accessible InDesign, PDFs &amp; MS Office |