I don't think that the absence of language tagging is causing your issues. I grabbed some Khmer text I have laying around for a current project, and made some PDFs from InDesign by exporting, and from Word (by saving PDFs, not by printing to the Adobe PDF printer). I did the same with Nepali. Saving PDFs from Word led to PDFs passing the Character Encoding section of the Accessibility Check. Exporting PDFs led to failure, with the exact same text and the exact same font. The MS Word PDFs character encoding passed whether the Khmer text was marked as Khmer or Nepali or English. The InDesign PDFs always failed.
This doesn't bug me, from a usability standpoint. But you're not trying to get this to work for a blind Nepali screenreader user, right? You want it to pass Acrobat's accessiblity checker - that's my guess, at any rate. Now, when I mentioned Bevi's posts, I didn't put in a link to a single post. I didn't link anything at all, to be honest. I just mean "I go to Bevi's posting history and just read her posts, for my own edification." And one thing that's come up many times in her posting is that there's no hard link between "passing Acrobat a11y check" and "insulating my employer/client from ADA lawsuits." Passing that check doesn't make it "ADA compliant," right? Here are two points of view. The first is taken from an Adobe page discussing PDF/UA:
How do I know if my PDF is ADA compliant?
With Adobe’s Acrobat Pro it’s easy to check if your PDF is ADA compliant. Select Tools, before heading to Accessibility. From there you should be able to run the option Full Check. Once the check has completed, the report should tell you everything you need to know.
That's Adobe. Here's Bevi on the same topic:
RE: PDF checkers... All software checkers (and their online services, as well) use AI to run through a file and determine whether it passes compliance or not. But real accessibility compliance varies from file to file, depending upon the actual content.
Therefore, you must use more than these programs and services to determine compliance: you must have trained human checkers to determine many items:
The logical reading orders shown in the Tag Tree and Architectural Reading Order (what's called "the order panel")
Alt Text and Actual Text
Whether Summaries and captions are required
Footnotes
Whether or not a hyperlink needs Alt Text
I think that maybe not every PDF accessibility checker uses "AI" but it's clear to me that Bevi's commentary implies that you don't guarantee comformance with Section 508 by running the Accessibility Checker in Acrobat and saying "Yup, that's good! No whammies!" So: is your job "make the PDF pass the Acrobat accessibility check?" Is your job "prevent ADA lawsuits?" Is your job "make this document available to blind Nepali folks using screenreaders?"
My experimentation with exporting PDFs from various tools with various fonts leads me to believe that there isn't going to be any way for us as end users to get these character encoding errors to go away in Acrobat's accessibility checker. So, if my supposition is correct regarding your job, here ("make the whammies go away in the Acrobat a11y check for complex-script content in PDFs produced from InDesign") then your next step is going to be twofold:
1) write up a really good bug report at indesign.uservoice.com
2) get lots of people to upvote it, because it's vote quantity that surfaces the issue for the developers
I suspect that the way that text streams are produced by InDesign is the problem, here, but I really have no clear idea. The last time I did a deep dive into how text is encoded in PDFs for complex-script languages left me a little punch-drunk, to be honest. But I see that both of your Nepali fonts are encoded differently in the PDF, but when I dig around in the PDF using the fabulously useful PDF browsing tool at brendandahl.github.io/pdf.js.utils/browser/, I see properly encoded text. However, if there's some way to make these errors go away by taking any actions in InDesign or Acrobat, I don't know it, and I would very much like to chat with someone who does! There are some discussions around here that indicate that using different fonts can sometimes make these errors go away. That's also part of Adobe's seemingly official stance on the matter.
My own hunch is that the a11y checker is choking on the Acrobat encoding of the methods that the type designers used to render these complex glyphs. Some of the older transitional-Unicode fonts I have, especially for Burmese and Tibetan, contain many thousands of precomposed glyphs, often referred to in English as "stacks" when Tibetan typography is under discussion. This is because the type designers couldn't reliably expect the stacks comprised of multiple Unicode codepoints to render correctly in all environments. Neither Kokila nor Adobe Devanagari uses this method. Sure, there are some precomposed stacks in there, but broadly speaking, there are plenty of Other Methods used in both fonts that work great for print or web PDF distribution, but that maybe cause the a11y checker to choke. Here, take a look at this little GIF I made of one of the ways in which I think your Nepali might be failing:
Clearly, something in this font is picking one of those GIDs to represent "DEVANAGARI VOWEL SIGN I" that best matches the rest of the values in that stack. That's not being encoded in a way that is Accessible, I think. I've tried it with many different fonts that support Nepali, and that vowel sign gets an Acrobat a11y whammy every time. So: maybe one of the forums regulars with more accessibility background than myself can help us out here? Or are we perhaps going to start working up a bug report?
... View more