I encourage others to share your experiences in remediating PDFs for ISO 14289 (PDF/UA) compliance. Let’s learn from each other’s experiences.
I am working with documents that previously passed the Acrobat checker and PAC 1.3 along with manual checks for WCAG 2.0 compliance – so, the documents were as accessible as I knew how to make them. Below are the errors that I am seeing quite a bit from the new PAC2 PDF/UA-compliance checker. The “fix” is not necessarily the best, just what I have found that seems to work. I am using Acrobat Pro XI.
error: Font not embedded
fix: Tools > Print Production > Preflight > PDF fixups > Embed fonts
comment: This does not always work as some font licenses do not allow embedding. If you encounter non-embeddable fonts hopefully you have the source document and can use a different font.
error: Tagged content present inside an Artifact
fix: Open the Content pane. Open Artifact containers to find any content containers hiding inside. Drag the content containers to their proper place outside the Artifact container.
error: Alternative description missing for an annotation
fix: Add alt text to link tags.
comment: This seems an odd error. Some links benefit from alt text but others are perfectly clear without it. Seems like this should be a judgment call, but the Matterhorn Protocol insists on links having alt text.
error: Figure element on a single page with no bounding box.
fix: This error goes away if I tag the figure as an artifact, which makes sense. But if I then retag it as a figure and add back in the alt text, the error stays away.
comment: Seems odd. Even the Matterhorn Protocol PDF (http://www.pdfa.org/wp-content/uploads/2013/08/MatterhornProtocol_1-0.pdf) fails PAC2 on this point! This could be a rough spot in the PAC2 beta, not a real error, but is easy enough to “fix”.
error: DisplayDocTitle key is not set to true
fix: File > Properties > Initial View tab > In the Show drop down box, select “Document Title”
error: PDF/UA identifier missing
fix: Create an xmp file that includes the required snippet of metadata (example: http://bygosh.com/files/pdfuaid.xmp). Then: File > Properties > Description tab > Additional Metadata... > Advanced > Append
comment: This should be the final remediation step, after the document is otherwise PDF/UA-compliant. To apply the PDF/UA ID to a document that is not compliant would be fibbing.
Which parts of that xmp file need to be customized? Also has anyone run into and fixed the error "role mapping for standard structure types"?
The example xmp file is intended to be used as-is. I am not an xmp expert though so if you notice any needed customizations please post. As-is it has worked many times for me.
My best guess re. the role mapping error is to look for something wrong in the Document Role Map. In the Tags Pane, click the down arrow next to the Options icon and select Edit Role Map...
Hope this helps.
a 'C' student
Thanks for sharing. I use Acrobat AND Pac 2.0 and Cabaret 5.1 for validating. Sad there is not a single tool and there are so many different ways of interpreting the standards. For checking the reflow I use Acrobat and pdfGoHTML. The Sample you supply is incomplete and will cause errors if the document is also PDF/A. You must properly define the custom variable since it's not part of XMP 1.5. Use this instead: http://www.robinschwab.ch/pdfUA.xmp
Thank you for correcting the deficiencies in my XMP file AI203. This is very helpful and much appreciated. Also for the tip on Cabaret - it is now on my list to evaluate.
Thanks for the great advice.
I am also creating PDF/UA compliant PDFs and validating with PAC 2.0. I am not sure what one of the warning means though. Everything passes except for a warning in the Structure it states "P structure element used as root element"
What does this mean and if it's only a warning then is my PDF considered as PDF/UA?
My approach is that PDF/UA is an extension of PDF/A-1a. <p> means paragraph. If your whole document is just one paragraph, your structure is probably bullshit. The root element should be <document>, maybe containing <article> and those containing <p>. Usually there is no automatism, you have to define your articles and paragraphs by hand. Its' very similar to (X)HTML. That's why a valid PDF/(U)A will perform reasonnably well as a website while converted with pdfGoHTML.
Management summary: Your PDF is probably neither PDF/A-1a nor PDF/UA.
Is there anyway I can send you the PDF that I am having this problem with so that you can review and let me know what I am doing wrong?
The first tag in your tag tree should be <Document>. All other tags should be nested under <Document>. A simple example:
Thank you a_C_student. This fixed my "P structure element used as root element" errors.
This has been bugging me for several months – I could not get links to pass both PAC 1.3 (WCAG 2.0) and PAC 2 (PDF/UA). PAC 2 insists that links must have alt text, PAC 1.3 insists they must not. This month the PDF Association, PDF/UA Competence Center published the excellent paper “Climbing the Matterhorn: An introduction to the definitive algorithm for PDF/UA conformance” (http://www.pdfa.org/wp-content/uploads/2014/01/ImplementingPDFUA-ClimbingtheMatterhorn.pdf). Experimenting with the file, I noticed it has links that pass both PACs. By carefully examining the link tag properties, I was able to spot the trick. Rather than using the Alternate Text or Actual Text fields, alt text was applied directly to the Contents key of the tag. Trying it out on other files, I leave the link tag Alternate Text and Actual Text fields blank and click the “Edit Tag …” button, then drill down through “Tag Element”, “/K [Array]”, “ <<Dictionary>>”, and “/Obj <<Dictionary>>”, then click “New Item” and add:
Value: link text goes here
Value Type: String
This makes both PACs happy. It takes a bit of work so I am not sure how often it will be worth the effort in practice, but I like knowing that it can be done. The file linked above has example link tags, and the paper is well worth reading on its many other merits.
a 'C' student
I have a tagged pdf created with InDesign CS6 then further tweaked in Acrobat ProX. Having ironed out all conformance issues hghlighted by PAC2, I saved an xmp file by clicking the top right hand corner arrow in the Properties Additional Metadata window. Then I appeneded this to my pdf as expained in adobe's guide (Apple click to select xmp file) and saved the pdf. The file size shows as being 7kb more so something's definitely been appended.
But PAC2 says the pdf/ua identifier is missing. Has anyone else had this experience? If so, is there a fix?
You will need to append an XMP file that includes the PDF/UA identifier. In an earlier post Al2O3 provided this example, which works great in my experience: http://www.robinschwab.ch/pdfUA.xmp
I am far from an XMP expert, but it sounds like you may have cluttered your file's metadata with duplicate information if you saved the metadata to XMP then appended it to the same file. I would revert back to the pre-appended version of your PDF if you have it, then append A1203's XMP.
Hope this helps.
a 'C' student
Many thanks for this response. I know almost nothing about XMP … and you may well be right about cluttering the metadata.
In the example that Al2O3 provided there doesn't look to be any specific reference or data for a particular pdf. So if I create and fix up a new tagged pdf can this file be appended directly?
Thanks – I just deleted the pdf's previous metadata then appended Al2O3's xmp; the pdf has now been accepted as acessible by PAC2.
Very grateful for your help.
Because Word for Mac has no tagged pdf export I've been trying to create UA compliant pdfs from documents created in LibreOffice 4 (MacOS). Using LibreOffice's 'tagged pdf' export option produces pdfs that need further fixing in Acrobat Pro, specifically:
One issue highlighted by PAC2 which I've been unable to fix is:
• 'missing ID in Note structure elements': Acrobat correctly identifies the Note elements but the ID field is empty. I can't find the format in ID is specified; it's not '1' not '#1' etc
I would greatly appreciate any insights others can offer into the format for specifying tag ID
A couple of members of the PDF/UA development team - Duff Johnson and Ferass Elrayes, acknowledged on the Linkedin forum "About PDF/UA", that the ID requirement for Note tags was a mistake. As the latter put it:
"Duff is right. This is not used for anything now. While the standard was being developed, we wanted to add an attribute on the Reference tag called "Target" and that attribute would hold the unique ID of the Note tag in order to enable "Structural Navigation" (a feature that is missing from ISO32000-1 but introduced in ISO32000-2 in a different way). We did not end up adding the Target attribute and therefore, the ID for the Note tag is useless."
This requirement will likely be removed in the next version of ISO 14289 (PDF/UA). In the meantime, a workaround to get PAC2 to "pass" is to manually add a unique id to the "ID" entry in each Note tag's properties (right click the Note tag, select Properties, you will find the ID field on the Tag tab). It really does not matter what you enter - as long as it is unique.
Hope this helps.
a 'C' student
Thanks, that's very helpful. I'll come up with a set of IDs.
PAC2 has highlighted an error in tagged pdfs created from LibreOffice4; the requirement for alt text for annotations associated with the Table of Contents.
The interactive TOC is generated by LibreOffice4 and cannot be edited within that programme. The tag editor in Acrobat Pro X shows a logical structure hierarchy for each entry:
<Contents 1>(the style name for level 1 TOC entry)
Content name (eg Introduction)
page no (eg 1)
Acrobat's tag editor allows alt text to be entered for any of these elements, but will only save alt text for the style name (<Contents 1> in the structure above). It will not save alt text for the TOC, TOCI or link tags.
Looking at various forums, there seems to be dispute/ambiguity about whether alt text for annotations is specified in the pdf 1.7 standard. But PAC2 records the annotation error on every entry in the Table of Contents even when the style name has alt text.
Does anyone know how this error can be eliminated?
PAC2 is enforcing Matterhorn Protocol criterion 28-012, which requires every a link annotation to include an alternate description in the Contents Key. The solution is to create a Contents Key containing alt text for each link tag (See post #9 above. The Matterhorn Protocol PDF has good examples - that is, TOC entries with alt text in each link Contents Key).
Or you may choose to live with this "error". This is an example of a compliance criterion that is well ahead of available tools and AT. No current AT makes use of the Contents Key. No current accessibility remediation tool makes it easy to create and configure the Contents key - it can be done with Acrobat, but is tedious for a long TOC. If you are like me - that is, you want to make sure your PDFs are accessible for the AT of tomorrow as well as today, and a bit obsessed with making PAC2 say "Pass" - go for it. But with the understanding that it is a bit of a chore.
a 'C' student
Belatedly, just to say thanks for your super helpful advice on pdfUA issues, and the TOC link alt text conundrum in particular.
Like many others I strive to achieve the ideal of PAC2 conformance. But as I'm creating pdfs for a living, mainly for resource starved NGOs, I need to strike a balance between time put in and tangible results.
Your expertise and perspective has helped me to stay sane and deliver documents that I'm confident achieve high standards of accessibility … so thank you again.
Thanks for starting this discussion “a ‘C’ student” and thanks to all the contributors. I’m happy that I have been able to solve a few mysteries presented to me by PAC 2.0!
I’m trying to work through the following error: “Alternative description missing for an annotation”. This error is flagged on each item in my table of contents. Note: the authoring program is InDesign CS6.
Can you provide some more detail regarding fixing link annotations that have been generated through a table of contents?
I have referred to post #9 but I have no experience editing tags and my attempts haven’t worked so far. Another reason I’m inquiring about this is when I try to investigate the “ClimbingtheMatterhorn” pdf, it doesn’t show a table of contents, only regular hyperlinks. Also, I tried to investigate another document (pertaining to the Matterhorn Protocol) that did have table of contents but I’m unable to fully understand what I’m seeing when I drill down into the tag element.
Perhaps someone has solved this particular problem or can point me in the right direction in terms of learning about editing tags?
Also, in Post #9, what exactly should be entered for the “Value” (“Value: link text goes here”)? Would it be different for a bookmark link?
The requirement that all links include alternate text in the Contents key is frustrating ...
So, you have to make a choice. You can ...
If you choose the last bullet, the PDF/UA Reference Suite includes example TOCs. As to what should be entered for "Value", for external links I typically use the title of the target page. For TOC entries, following the examples in the PDF Reference Suite, the Contents key value mirrors the text of the link, for example "Chapter 1: Introduction".
Hope this helps.
a 'C' student
Callas PDF Pilot sets the Creates a Content entry for Link annotations, and does so globally through the document. This allows you to ignore the Table of Contents and other hyperlinks that do not really need alternative text to be clearly understood and focus your attention on those links that do need alternative text. This passes PAC 2.0's automated checker. It's a standalone program, designed mostly for prepress, but PDF Pilot also has a number of helpful accessibility features, hopefully they will add even more going forward.
I deal with a lot of Word to PDF files and lately have been getting an embedded font error on the spaces in labels in bulleted lists. No other software I've tried fixes this consistently by replacing unembeddable fonts with similar fonts. It also will convert all untagged items to artifacts, which has been helpful with table documents from Word.
Thanks for the tip raeben3! After exploring the Callas website, it looks like both PDFaPilot and PDFToolbox have the link content entry function. I will definitely make use of the free 7-day trial of one or both.