Skip to main content
Participant
March 13, 2021
Question

Can I programmatically add tag to a normal pdf and convert it to accessible pdf

  • March 13, 2021
  • 2 replies
  • 2095 views

Hi,

 

Iam working on pdf remediation. I have normal pdfs. Iam thinking to write a script to read a normal pdf and identify various contents like headers, sub headers, lists, forms, tables, images and then add tags to the pdf content accordingly and generate a tagged pdf which will pass adobe accessibility check. My idea is reduce manual tagging efforts (in adobe acrobat dc pro software) by atleast 60 to 70%. 

Are there sdks which support adding tags programmatically to a normal pdf?

 

Thanks in advance

This topic has been closed for replies.

2 replies

JR Boulay
Community Expert
Community Expert
March 13, 2021

This function already exists in Acrobat Pro, there is no need to reinvent the wheel.

But as explained above, automatisms can do a lot of things but it's a human who has to polish the job.

Acrobate du PDF, InDesigner et Photoshopographe
Participant
July 6, 2022

We make PDFs dynamically so it would defnitely be something I would love to be able to do. Doesn't sound like a reasonable option.

Participant
April 6, 2023

Did you find any solutions?

 

 

Legend
March 13, 2021

It's not impossible. However, it requires both C++ programming skills and a very deep knowledge of PDF internals: the graphics model, the text model and the tagging model, which all interact. If you have that (or the time to study) you can use the PDSEdit layer in a custom plug-in.

Bear in mind that identifying "headers, sub headers, lists, forms, tables" is all guesswork. These things are not marked in a different way, pre-tagging. A table is a mixture of lines and text which the human eye quickly recognises as having patterns that make it a table. If you are working with highly standardised documents this is much easier.

 

By the way, Adobe's accessibility checker is not considered the industry standard for good accessibility; if you go to this trouble you should probably aim higher.

Participant
June 2, 2023

Why C++ specifically as programming language to interact with or build a PDF document? 

I interpret this question as asking, how technically is the document markup model supported by PDF format represented in that format? Has anyone actually published guidance here? We live in a world where PDF is the afterthought format to more robust data modeling logics. How can we enable those logics to port into a PDF friendly namespace programmatically.

 

this is just a personal opinion, but it is shocking that even Adobe hasn't made more transparent open source ways for programmers to enable their content generation tools to output PDF in a way that retains the structure and semantics of content (not just visual layout).

 

of course for reasons of access, but as developers, data engineers, even enthusiasts, we should be nagging the heck out of these technical gatekeepers.