Skip to main content
Participating Frequently
December 10, 2023
Question

How To Add Language Tags For Amazon KDP eBook?

  • December 10, 2023
  • 1 reply
  • 1780 views

Hi, I'm writing a ebook in Marathi language. I'm using Nirmala UI font.

While uploading my ebook on amazon I understood that I need to set my book content's Language tags as Marathi (aka. Select language as Marathi in adobe indesign properties window) But after doing that, a few letters in my language gets disconnected like "ल्ल" >> "ल् ल". I've noticed that by selecting Language tag as "Hindi" the text does not disconnect letters. Altough the tag needs to be Marathi. 

How do I overcome this issue? 

This topic has been closed for replies.

1 reply

James Gifford—NitroPress
Legend
December 10, 2023

I assume you're exporting to EPUB — fixed-layout or reflowable? The latter is greatly preferable.

 

Language gets a little complicated in ID/EPUB/KDP, and I don't know what your overall familiarity level is, so this will be long.

 

The only places language is addressed by InDesign is in the overall version/UI selection, which may set a few defaults in actual document settings (not sure), and in the language tag assigned to each Paragraph and Character Style:

 

The first thing to do, if you haven't already, is review every Paragraph and Character Style to make sure the correct language is selected. (This is assuming you have a meticulously-organized ID document, with a style assigned to every element. Spot formatting, always bad practice, compounds problems in EPUB export.) You may find one or more styles have a wrong language setting, and that will be your problem.

 

You can verify that tags are being used by opening the exported (reflowable) EPUB and looking at the main XHTML content file. These settings cause a language tag to be inserted into each content file of an EPUB export. Adobe seems to have changed this up a little, so there are two possibilities depending on which version of ID you're using:

  • Standard practice, and what was done a few versions back and is now done again, is to insert one language tag for the entire document, in the heading information:

  • If there is only one language setting in the doc, no further ones will be used. If there are individual settings, such as a French paragraph in an otherwise English doc, there will be one general setting and then a lang tag in each different element (paragraph or span).
  • For a time in the last year, EPUB export was including a lang statement in every text element, which some validators and services did not like. If you're running a slightly older version, you might see that instead.

 

To look at this content, open the EPUB export using a ZIP archive tool. Under the OEPBS folder, you'll find one or more XHTML files, relatively large proportional to content. Extract one and open it in any browser, then switch to code mode. You should see heading lines like the above, with a lang statement for "mr-IN". Otherwise, you'll find a lang statement in every text paragraph.

 

If everything seems correct from your end — you are selecting the right language and it's specified in every style and you've done a quick check to make sure it's being exported — then the issue is with KDP. I haven't done any non-English projects with them, but if you have a language setting and it's correctly set to Marathi, and you still get these word/alphabet issues... it's on them. You'll have to work it through with their (really lousy, all-but-nonresponsive) tech support.

 

I don't find myself surprised that non-English, or at least non-Western languages like the Indic ones have bugs in the conversion details. But good luck, and with better luck it's just a simple mistake on your part, which is at least easy to correct. 🙂

 

Check back if none of this fixes your problem or there are other details, or if KDP's response doesn't seem correct.

om.ranadeAuthor
Participating Frequently
December 11, 2023

I'm sorry if my query was not clear but heres a little clarification,

I'm facing the problem when I select the language to Marathi (from Charecter/Paragraph Style) - few letters are not rendering properly. Although the language tags are correct as "mr-IN" and KDP is not giving me a issue. Its just that when I select the language as Marathi - few letters are not rendering as they do when the language is English-US.
Side Node : I do not face this problem when I select the language as Hindi. I tied to play with OpenType settings but I don't exactly know how they work. (PS: Could this issue even be solved?)

 

When language Is Set To English-US (This is the correct form)

 

When Language Is Set To Marathi (This is where the letters are separated)

James Gifford—NitroPress
Legend
December 11, 2023

I'd also wonder how they will be handled in EPUB export. Joiners/nonjoiners, in my experience, are 'trustworthy' only for print output. Any specific experience with exporting them to HMTL or EPUB?

 

Been a while since I posted the phrase "Wow, just wow." I understand that lots of stuff doesn't make it into EPUB export but pulling Unicode control characters of of text? That... simply didn't occur to me. I mean, I'm going to test it tonight, for sure, but that would make it completely impossible to export EPUB in, er, at least Malayalam, Sinhala, and Tamil, just off the top of my head. Actually, when I think about it, non-joiners are necessary in Persian, so if non-joiners were being stripped out of EPUB export from InDesign, then one of the three Farsi proofers who went over my work with a fine-toothed comb would have certainly raised a flag at one point or another. Still, it's worth testing out. 

 

I would also never suggest that a long complex-script document rely on a visual proof to ensure proper glyph composition; that's the kind of thing that GREP can solve quite easily. There are only so many characters in Devanagari that can possibly follow a halant, so I imagine that a complete test would take maybe four to six minutes to put together. 


I suspect the various export processes manage these alphabetic issues, over and above how browsers and readers do — at least, any such tools that aren't completely Western- if not English-centric. I don't really know one way or the other but seem to recall at least some publication flows having problems with such embedded characters. (After all, ligatures are almost universallly a phantom object created by the display device, not actual glyphs in the text strings or dependent on joiners.)

 

And knowing next to zip about the Indic languages at a technical level, maybe you're right that there's a limited number of combinations that could be GREPped into compliance.

 

But I can't but help think the simple, obvious, bulletproof solution here is to use a fully compatible font. 🙂