Participant

Question

Can I programmatically add tag to a normal pdf and convert it to accessible pdf

Forum|Forum|5 years ago
March 13, 2021
2 replies
2127 views

Hi,

Iam working on pdf remediation. I have normal pdfs. Iam thinking to write a script to read a normal pdf and identify various contents like headers, sub headers, lists, forms, tables, images and then add tags to the pdf content accordingly and generate a tagged pdf which will pass adobe accessibility check. My idea is reduce manual tagging efforts (in adobe acrobat dc pro software) by atleast 60 to 70%.

Are there sdks which support adding tags programmatically to a normal pdf?

Thanks in advance

This topic has been closed for replies.

JR Boulay

Community Expert

This function already exists in Acrobat Pro, there is no need to reinvent the wheel.

But as explained above, automatisms can do a lot of things but it's a human who has to polish the job.

Acrobate du PDF, InDesigner et Photoshopographe

B

Blu Sanders

Participant

We make PDFs dynamically so it would defnitely be something I would love to be able to do. Doesn't sound like a reasonable option.

P

pijore29277439khfe

Participant

Did you find any solutions?

T

Test Screen Name

Legend

It's not impossible. However, it requires both C++ programming skills and a very deep knowledge of PDF internals: the graphics model, the text model and the tagging model, which all interact. If you have that (or the time to study) you can use the PDSEdit layer in a custom plug-in.

Bear in mind that identifying "headers, sub headers, lists, forms, tables" is all guesswork. These things are not marked in a different way, pre-tagging. A table is a mixture of lines and text which the human eye quickly recognises as having patterns that make it a table. If you are working with highly standardised documents this is much easier.

By the way, Adobe's accessibility checker is not considered the industry standard for good accessibility; if you go to this trouble you should probably aim higher.

D

defaultbphyydhhkgks

Participant

Why C++ specifically as programming language to interact with or build a PDF document?

I interpret this question as asking, how technically is the document markup model supported by PDF format represented in that format? Has anyone actually published guidance here? We live in a world where PDF is the afterthought format to more robust data modeling logics. How can we enable those logics to port into a PDF friendly namespace programmatically.

this is just a personal opinion, but it is shocking that even Adobe hasn't made more transparent open source ways for programmers to enable their content generation tools to output PDF in a way that retains the structure and semantics of content (not just visual layout).

of course for reasons of access, but as developers, data engineers, even enthusiasts, we should be nagging the heck out of these technical gatekeepers.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded