Seems like this may be the next step. I'd love to see it soon!
Are you suggesting we tag based on the output from our Extract API?
Not really. I'm suggesting a new service that creates a tag structure in an existing pdf so that it can be utilized by screen reader software more easily.
Autotagging in this context would be worthless and deceptive. If tagging could be made automatically, it wouldn't have been necessary to invent tagging at all.
Autotagging arguably has some role as the first step in a labour intensive manual process, followed by careful review and checking; it surely has no proper role in a direct delivery workflow. This would also open up legal claims that service providers were claiming compatibility with ADA and other comparable legistlation, by delivering tagged but noncompliant content. (I am not a lawyer).
It's not worthless.... like you said, it's the first step. Adobe already does AutoTagging in Acrobat. Responsibility falls on the shoulders of the entity delivering the document to the reader.
We have actual data based on the experiences of people who actually use assistive technology showing that auto-tagging improves their ability to consume the PDF dramatically. Some of their responses to an auto-tagged PDF actually moved me to tears they were so happy. You're not going to get a WCAG Compliant PDF from an auto-tagger but you do get a far more friendly PDF.
Add this to the fact that many PDF creation tools that create tagged PDF create terrible tags. I know of one (that shall remain nameless) that tags every cell in a table as a paragraph. Not a paragraph in a table... just a paragraph. No columns, no rows, just paragraph after paragraph.
Stripping these tags and then auto-tagging creates a far better experience.
While I agree that in SOME limited cases, auto tag works well, the VAST majority of auto-tagged documents are a complete mess. The one thing to remember is that someone using a screen reader has little or no indication that anything is wrong with the document when it comes to read order, table structure, headings vs paragraphs etc. If you cannot verify what you see on screen is what is actually tagged appropriately, then anything you get will be an improvement. I am not trying to rain on anyone's parade. I just want to bring a sense of reality to the 80% complete failure rate of auto tagging documents as they are now. I can show you any number of test documents (almost 10 years experience in accessibility) that auto tag simply chokes on. HOWEVER... I will say that some accessibility is better than no accessibility and I am sure the people who have improved user experiences are eternally grateful for your hard work. There is just a LOT of room for improvement. Chad Chelius and I have been trying (unsuccessfully) for more than 5 years to have any meaningful conversations with the Adobe Dev Teams in charge of PDF accessibility. I am happy that people are benefitting, but the reality is the tool works well in very limited situations.
I also know one tool (InDesign) that cannot even set scope for column or even assign a row header for a table. Nor can it export multi-line hyperlinks correctly or nest captions correctly, I could go on but shall leave it there. Accessibility is a journey. Tagging documents starts with good source creation in combination with automated and manual review. It will always be that way. #MeaningfulHeadings #MeaningfulAlt-text