Iam working on pdf remediation. I have normal pdfs. Iam thinking to write a script to read a normal pdf and identify various contents like headers, sub headers, lists, forms, tables, images and then add tags to the pdf content accordingly and generate a tagged pdf which will pass adobe accessibility check. My idea is reduce manual tagging efforts (in adobe acrobat dc pro software) by atleast 60 to 70%.
Are there sdks which support adding tags programmatically to a normal pdf?
Thanks in advance
Edit and convert PDFs, How to, Standards and accessibility
It's not impossible. However, it requires both C++ programming skills and a very deep knowledge of PDF internals: the graphics model, the text model and the tagging model, which all interact. If you have that (or the time to study) you can use the PDSEdit layer in a custom plug-in.
Bear in mind that identifying "headers, sub headers, lists, forms, tables" is all guesswork. These things are not marked in a different way, pre-tagging. A table is a mixture of lines and text which the human eye quickly recognises as having patterns that make it a table. If you are working with highly standardised documents this is much easier.
By the way, Adobe's accessibility checker is not considered the industry standard for good accessibility; if you go to this trouble you should probably aim higher.