Copy link to clipboard
Copied
The text validation process failed due to character encoding issues. Specifically, when a character is represented by 3 to 4 characters, it should be converted to a single character. During the web PDF processing, there were no errors or font issues. During the accessibility processing, the Myanmar (Burmese) characters were not being interpreted correctly, which led to an error message (please refer to the screenshot). We are using the Padauk and Myanmar fonts and trying to display Burmese characters, but it's not working. We have been attempting this for over three months, but the characters are not clear. Please provide some suggestions to help us resolve this issue.
Copy link to clipboard
Copied
Sorry for the trouble. Would you mind checking a similar discussion: https://adobe.ly/4bIsmak and trying the suggestion shared in this post?
Let us know if that helps.
Thanks,
Harshika
Copy link to clipboard
Copied
Hi @HARSHIKA_VERMA,
Thanks for your reply.
You provided a link that discusses font and spelling issues, but my question is not related to font and spelling. The issue I am encountering is related to a character encoding error that occurs when printing the web PDF during processing. Please provide me with some guidance as we have been grappling with the same issue for the past four months.
Thanks,
Transforma
Copy link to clipboard
Copied
Hi Transforma,
Sorry for the delay in response. Is it possible for you to share the packaged InDesign file with me via private message so we can test it on our end? Also, if possible, please share a short recording of the workflow.
We will try our best to investigate.
Thanks,
Harshika
Copy link to clipboard
Copied
Have you been able to solve the character encoding issue? I've been running into the same/similiar issues with other langauges such as Nepali and Khmer. Wondering if the issues are due to lack of language support within the progam?
Copy link to clipboard
Copied
I don't know much about how the Accessibility Checker works in Acrobat, to be honest. I've been working with PDF accessibility of non-Latin scripts for a while; I can't say "I'm dabbling" anymore, but I don't know enough to be certain in my conviction that Acrobat's Accessibilty Checker can't be trusted with such judgment calls. The thing that's being flagged here in the OP is the character encoding of the Burmese phrase embedded in the English text in the PDF exported from InDesign. Is that what you're asking about, or are you asking about something else? Have you maybe tried a different accessibility checker, like PAC 2024? Do you encounter the same kind of encoding problems being flagged by PAC?
There are plenty of reasons why something perfectly kosher might fail an accessibility check. My understanding (mostly gleaned from Bevi Chagnon's posts here) is that PDF/UA requires correctly encoded Unicode text, and if some complex script text embedded in the middle of an English paragraph fails a character encoding check, then I'd start looking at the font and encoding.
Now, the initial post seems to be asking both a) why is the Burmese text failing an accessibility check, and b) why is the Burmese text rendering incorrectly in Acrobat? These are two very different questions, I think. As I don't know the PDF/UA spec well enough to make a judgment call, I'd start with "Hey, why is this text rendering incorrectly in Acrobat?" If I were to solve that issue, then maybe the whole "does-it-pass-Acrobat's-arbitrary-accessibility-checker-that-is-frequently-flat-out-wrong" question might be resolved without any additional heavy lifting.
So, if you're having problems with Nepali and Khmer (both of which I handle in ID on a not-quite-daily basis without issue, and have done so for years), what kind of problems are you having? Are you having problems with text that looks fine in InDesign but fails to render correctly in Acrobat? Or does it look fine in PDF, but fail an accessibility check? Can you supply more details regarding the issues you're encountering? Harshika has already asked the original poster for those More Details (what's your workflow? can you share a package) and we forums-readers can't know if anything came of that request. But if you can post that kind of background, we should be able to nail down exactly where the complex-script support in InDesign is failing, or whether or not the Acrobat accessibility checker was giving meaningful results, or if there was something wrong with the way that InDesign is exporting complex script text in PDFs.
Copy link to clipboard
Copied
Thanks for your feedback. I'll try to provide a bit more explanation of what is happening. The files are created in InDesign and then exported to PDF. The PDF is then remediated for PDF/UA and WCAG. Both Nepali and Khmer appear fine in InDesign, and visually on the PDF page itself, the texts are correct when exported. Where they are not displaying correctly is within the tags panel. My understanding is that in order for the screenreader to interpret the texts correctly to be processed, the tags (translations) will need to also display correctly to be read properly.
When using Acrobat's accesibility check it throws a "Character Encoding" error. I did a test through using PAC, and the tags also do not display correctly. Under PDF/UA checkpoints, it looks like it's showing a "Natural Language" error and then there is also an error that says "Document language metadata primary subtag is unknown".
The link did not come through on your comment about Bevi Chagnon's post, but it might have been one I've came across, and have tried a lot of the suggestions without any luck.
I'm trying to figure out how to get the texts to display correctly in the tags panel, or if it's just not possible? I'm wondering if there is a language limitation, where it's just not supported within Acrobat. If that is the case, I'm not sure how files are able to be compliant if the tags can not be read properly.
I'm also not so sure if I'm using the correct ISO tag for the language when exporting from InDesign, or if there is something other I'm missing on the export settings.
Attached is a sample .indd file (it would not allow me to attach the package), an example PDF, and then a screenshot showing the text not being displayed.
Thanks for any feedback you might have.
Copy link to clipboard
Copied
I don't think that the absence of language tagging is causing your issues. I grabbed some Khmer text I have laying around for a current project, and made some PDFs from InDesign by exporting, and from Word (by saving PDFs, not by printing to the Adobe PDF printer). I did the same with Nepali. Saving PDFs from Word led to PDFs passing the Character Encoding section of the Accessibility Check. Exporting PDFs led to failure, with the exact same text and the exact same font. The MS Word PDFs character encoding passed whether the Khmer text was marked as Khmer or Nepali or English. The InDesign PDFs always failed.
This doesn't bug me, from a usability standpoint. But you're not trying to get this to work for a blind Nepali screenreader user, right? You want it to pass Acrobat's accessiblity checker - that's my guess, at any rate. Now, when I mentioned Bevi's posts, I didn't put in a link to a single post. I didn't link anything at all, to be honest. I just mean "I go to Bevi's posting history and just read her posts, for my own edification." And one thing that's come up many times in her posting is that there's no hard link between "passing Acrobat a11y check" and "insulating my employer/client from ADA lawsuits." Passing that check doesn't make it "ADA compliant," right? Here are two points of view. The first is taken from an Adobe page discussing PDF/UA:
How do I know if my PDF is ADA compliant?
With Adobe’s Acrobat Pro it’s easy to check if your PDF is ADA compliant. Select Tools, before heading to Accessibility. From there you should be able to run the option Full Check. Once the check has completed, the report should tell you everything you need to know.
That's Adobe. Here's Bevi on the same topic:
RE: PDF checkers... All software checkers (and their online services, as well) use AI to run through a file and determine whether it passes compliance or not. But real accessibility compliance varies from file to file, depending upon the actual content.
Therefore, you must use more than these programs and services to determine compliance: you must have trained human checkers to determine many items:
- The logical reading orders shown in the Tag Tree and Architectural Reading Order (what's called "the order panel")
- Alt Text and Actual Text
- Whether Summaries and captions are required
- Footnotes
- Whether or not a hyperlink needs Alt Text
I think that maybe not every PDF accessibility checker uses "AI" but it's clear to me that Bevi's commentary implies that you don't guarantee comformance with Section 508 by running the Accessibility Checker in Acrobat and saying "Yup, that's good! No whammies!" So: is your job "make the PDF pass the Acrobat accessibility check?" Is your job "prevent ADA lawsuits?" Is your job "make this document available to blind Nepali folks using screenreaders?"
My experimentation with exporting PDFs from various tools with various fonts leads me to believe that there isn't going to be any way for us as end users to get these character encoding errors to go away in Acrobat's accessibility checker. So, if my supposition is correct regarding your job, here ("make the whammies go away in the Acrobat a11y check for complex-script content in PDFs produced from InDesign") then your next step is going to be twofold:
1) write up a really good bug report at indesign.uservoice.com
2) get lots of people to upvote it, because it's vote quantity that surfaces the issue for the developers
I suspect that the way that text streams are produced by InDesign is the problem, here, but I really have no clear idea. The last time I did a deep dive into how text is encoded in PDFs for complex-script languages left me a little punch-drunk, to be honest. But I see that both of your Nepali fonts are encoded differently in the PDF, but when I dig around in the PDF using the fabulously useful PDF browsing tool at brendandahl.github.io/pdf.js.utils/browser/, I see properly encoded text. However, if there's some way to make these errors go away by taking any actions in InDesign or Acrobat, I don't know it, and I would very much like to chat with someone who does! There are some discussions around here that indicate that using different fonts can sometimes make these errors go away. That's also part of Adobe's seemingly official stance on the matter.
My own hunch is that the a11y checker is choking on the Acrobat encoding of the methods that the type designers used to render these complex glyphs. Some of the older transitional-Unicode fonts I have, especially for Burmese and Tibetan, contain many thousands of precomposed glyphs, often referred to in English as "stacks" when Tibetan typography is under discussion. This is because the type designers couldn't reliably expect the stacks comprised of multiple Unicode codepoints to render correctly in all environments. Neither Kokila nor Adobe Devanagari uses this method. Sure, there are some precomposed stacks in there, but broadly speaking, there are plenty of Other Methods used in both fonts that work great for print or web PDF distribution, but that maybe cause the a11y checker to choke. Here, take a look at this little GIF I made of one of the ways in which I think your Nepali might be failing:
Clearly, something in this font is picking one of those GIDs to represent "DEVANAGARI VOWEL SIGN I" that best matches the rest of the values in that stack. That's not being encoded in a way that is Accessible, I think. I've tried it with many different fonts that support Nepali, and that vowel sign gets an Acrobat a11y whammy every time. So: maybe one of the forums regulars with more accessibility background than myself can help us out here? Or are we perhaps going to start working up a bug report?