Copy link to clipboard
Copied
I am primarily a developer with limited knowledge of InDesign but I have run across a problem that I have attempted to solve in multiple ways but have yet to figure out. We have documents in four languages. The English and Spanish documents accurately export to HTML and PDF. But when it comes to Chinese text, I have some issues:
1) If I export a document in simplified Chinese to PDF, the text appears correctly until I copy and paste it, it then reverts to traditional Chinese when pasted to a new dopcument.
2) If I export the document to HTML, it only displays as traditional chinese.
This seems like a font issue. The traditional chinese documents uses a font called MHeiHK and the simplified uses a font called MHeiHKS. Those fonts are referenced in the InDesign document and appear in the css of the respective documents: MHeiHK in traditional chinese and MHeiHKS in simplified chinese. I have used both legacy HTML format and the newer HTML5 format. The same issue appears in both export types. In fact, when I open the exported document in an editor like Visual Studio Code, all the symbols appear to be in traditional chinese. Changing the css doesn't change the font (I wouldn't expect it to...)
So my question is: Is there a way to make sure the simplified chinese fonts used in InDesign export properly to both PDF and HTML? Our translation team mentioned something about the "base" font being traditional chinese but the simplified fonts somehow "overlay" or "modify" the TC base font to display as simplified. I don't know enough about fonts to know exactly what they meant by that.
We have hundreds of documents with this issue and we'd like to move more of them to HTML format but this has been a show-stopper for a few years.
Thanks in advance for any help or direction you can give me.
Copy link to clipboard
Copied
Well, when you export PDFs, you should be embedding the fonts you use into the PDF at export. So Acrobat (or a third-party PDF viewer) will display that text using the font that was used in InDesign and embedded at export. If you're copying text from the PDF and pasting it somewhere else, you're most likely getting raw Unicode text on your clipboard, and the place where you are pasting it is where your default Trad Chinese settings are going to be found.
Now, I don't export HTML from InDesign, ever, but I'd expect that whatever you export will essentially be text with tags, right? When you open that HTML you've exported from InDesign, it's simply displaying in whatever default HTML viewer you have installed on the machine in question. If you render well-formatted HTML in a browser, it should render correctly... assuming you've declared the language in the header, right? Once again, I don't have any idea what InDesign's current HTML export capabilities might be; I don't know if you just need to tweak the export settings somehow, or if you have to go in post-export and fix the language declarations. The only things I ever do anything that might be vaguely relevant would be to export XML for ingestion into a client's document management system, and in some of those cases I do apply XSL post-export to ensure good language tagging.
I wouldn't trust font name declarations in CSS to correctly identify the language code. You'd want to specify something like
lang="zh-CN"
to force Simplified Chinese.
This post here barely scrapes the surface, but I feel confident that you can figure out how to handle your multilingual document management if you ask the right questions, or if we poke you into asking the right questions.
Copy link to clipboard
Copied
Do you have the InDesign version for Asian languages installed?
Copy link to clipboard
Copied
No, I don't. Is that a requirement to be able to export the content correctly? The TC export works fine. It's just a mystery as to why the simplified Chinese does not..
Copy link to clipboard
Copied
Well, if you're using the HTML5 export, I promise you it's not working fine. I'd never used HTML5 export before this morning, but when I went and looked at the output, the language declaration in the header was identical to the language setting in InDesign.
That text is showing up as "Arabic" because that's my default language in new documents. Because it's set to Arabic, if I go and look at the header in the exported HTML, it's declared as "ar-sa":
To get the language ID correct, you have I think only two options:
1) Use a version of InDesign that has your intended language in the dropdown on the Character panel
2) Edit the HTML post-export to have the correct lang declaration
Copy link to clipboard
Copied
Thanks Joel,
Still sorting through this. Here is a snippet from the InDesign document which displays the simplified chinese correctly:
And, here is a snippet from the HTML exported:
Nothing modifed. Straight export. You can see that the exported document is in the traditional character set, rather than simplified. So, following your lead, I went to the HTML source. I see this:
And, yes, the lang attribute is set to zh-TW rather than zh-CN. And, based on the characters I see in the source, I cannot just change the lang attribute and get simplified.
Its hard to see in the first snippet but the font is set to MHeiHKS which is the simplified chinese font, but the CSS references only carry the traditional font references (MHeiHK) so I guess I am surprised that WYSIWYG doesn't follow through the export.
I will try changing the actual language, if I can, and give the export another try. Thanks for the pointers.
Copy link to clipboard
Copied
Firstly: if you look at my previous post, you'll see that the language in my test export is marked as Arabic in the Character panel. That's the language that I'd expect to see in the HTML export. Can you tell us what language you have applied to this Chinese segment in your export? As Willi points out, you need an East Asian install of InDesign to have any Chinese in that dropdown (unless you've modified your install of InDesign). More importantly for quality export, you'd need to have your text marked as Simplified Chinese in the Paragraph Style or Character Style. What are your language settings there?
Second: I didn't really know much at all about HTML export in InDesign until I started answering your question, but I've learned a fair bit quite quickly. I currently have InDesign 2024 and 2025 both installed. 2024 only offers boring HTML export, while 2025 has both "Legacy" HTML export and HTML5 export. Which are you using? They behave quite differently. If I do HTML export from 2024 with individual paragraphs marked as English, then that English setting is honored in the CSS, but the lang declaration in the header identifies my document as Arabic (ar-SA). HTML5 export from 2025 marked my whole document as en-US. So if your HTML export is Legacy and is coming through marked as zh-TW, then perhaps you're already using an East Asian install of InDesign. What can you tell us about the version of InDesign you're using?
Third: I can see from your exported HTML that you're not using paragraph styles; this is clear from the bit where your p class="Basic-Styles_Body-Text_ParaOverrides-1". Getting language ident to stick might require defining your desired language in a Paragraph Style or Character Style. Sticking to those styles without using local overrides will prevent all kinds of issues when you're trying to export HTML, as the contents of your styles in InDesign ought to carry over into the exported CSS. Do you know how to use paragraph and character styles in InDesign?
Copy link to clipboard
Copied
Thanks Joel. I appreciate all the pointers and I'll need to go back to our content designers since I really know next to nothing about the use of the tools within InDesign. As I said, I am just a programmer charged with figuring out why a simplified chinese language document outputs as simplified chinese when exported as a PDF document but ends up as traditional chinese when exported as HTML. You have given me good information to track down. I would guess that the translators are using a version of InDesign in their native language, but I don't know.
What I do know is if I use a service, like ChatGPT, to convert the traditional chinese text to simplified chinese and copy and past that converted text into an InDesign document and then export it to HTML, the simplified chinese text is retained correctly in the HTML. So, there is something subtle in the way the documents are being originally translated that results in the issue. The translators use traditional chinese for the initial translation and then that text is "converted" (I don't know how) into simplified chinese text. Also, a test that had the translator use simplified chinese text to start with, without first using traditional chinese, exported correctly as HTML. So I just need to connect those dots. I'll use your pointers and questions to guide me. I have some research to do in order to answer your questions.
Thanks again for your replies.
Copy link to clipboard
Copied
Likewise, I'd guess that your translators are using a Traditional Chinese version of InDesign. I suspect that this is the reason why your HTML export has a default language attribute of zh-TW. I'm pretty sure that I can answer all of your questions - I'm a localization engineer, this is exactly what I do for a living - but there are a large number of moving parts that you'd need to come to grips with, in order to get use from these answers. For example, to really know what's going on when your translators talk about "conversion" then you should probably skim most of, and rread and understand some parts of the Han unification article at Wikipedia. That's kind of a tall order, I know, but that's the kind of field this is.
To answer your immediate questions: when you have a Simplified Chinese string in an InDesign file and export a PDF, the PDF represents a complete document in a single file. The text, images, fonts, objects, etc. are all bundled up into a single file. If properly produced, the Simplified Chinese font you used in the InDesign file is embedded into the resulting PDF. When you double-click that PDF and open it up in your PDF view of choice, it should render the Simplified Chinese correctly. That's why your SCH renders correctly in your PDF.
When you export HTML from InDesign, the export tool is doing something very different from what I described above. But you can look at its output, right? It's making either a file or a folder, depending on whether you are exporting HTML from InDesign 2024, or HTML (Legacy) or HTML5 from InDesign 2025. But I'm ready to guess that you're not exporting HTML5 from 2025, because of what you've told me. Correct me if I'm wrong, please. InDesign is generating a) an HTML file based off of the layout of your InDesign file, plus b) a folder with some CSS in it, which was itself generated from the INDD by the export tool.
When you export HTML from InDesign 2024, the "lang" attribute in the header seems to be determined by the locale of the originiating install of InDesign that was used to save the INDD in the first place. But the fonts applied to that text are determined by the CSS that was generated by the export tool analyzing the Paragraph Styles, Character Styles, and local formatting applied to the text in InDesign. Here's a case where, if you don't know InDesign particularly well, there's a lot to learn. But if you and/or your translators set up your InDesign files correctly, then the correct languages and fonts will be applied to the Chinese text. It should be enough to say to them: Make sure that your Chinese text has paragraph and character styles applied, including proper language assignments, without overrides. This stuff is well documented; if you'd like, I can dig up a thread or three around here that can offer you a quick summary of how styles work in InDesign.
What I do know is if I use a service, like ChatGPT, to convert the traditional chinese text to simplified chinese and copy and past that converted text into an InDesign document and then export it to HTML, the simplified chinese text is retained correctly in the HTML.
Please don't do this. Unless you are fully literate in both languages, in which case I retract my plea. I've been handling encoding conversions and format conversions between forty different languages constantly, for decades, and the output of ChatGPT is not trustworthy. There are plenty of other conversion tools out there that your Chinese staff can tell you about. I mean, heck, they might use ChatGPT themselves, but they are literate in the source and target languages, and can spot errors if they choose to review the output. There are all kinds of regional and dialect variances in Chinese, and you can't just run stuff through a convertor without being literate in the language. But even clicking the "Convert Trad to Simplified" button in MS Word is better than trusting ChatGPT with your translations.
But that little anti-genAI slam aside, let me guess that you are pasting your Simplified-encoded Chinese text into a new InDesign document that you started on your own machine. Right? If you pasted it into an INDD started by your translation supplier, it'd have the lang declaration of zh-TW in the header after HTML export. That is why, I'm guessing, that your own HTML export experiments work correctly for you.
But the real way to ensure that your text is correctly encoded, and then correctly exported, and then correctly displayed, is to use InDesign's styles well, so that the convertor that generates the CSS has good inputs. This means that all of your Chinese text has a Paragraph Style applied, with the correct language set up in the Advanced section of the Paragraph Style. You should also look at the Export Tagging section, at the very bottom of the Paragraph Style options. I'd be happy to share my test files with you, if it'd be helpful. If I make a Paragraph Style called "Simplified Chinese" and I specify the Noto Sans SC font in that Paragraph Style, then the CSS generated at HTML export includes "Noto Sans SC" in the CSS. So, assuming that I have said font installed, it will render in my browser according to what is specified in the CSS. So if you want your Simplified Chinese text to render correctly in your browser, you need good CSS, which comes only from well-constructed, well-applied styles in InDesign.
(If you don't already know what happens in the browser when a given non-Latin-script font is not present to render text... well, we can get into that, but the reading list will be longer than just "go skim the article on Han unification." But that's part of the reason that your Simplified text is mysteriously rendering in Trad when you double-click on the HTML and open it in your browser; there's no info in your HTML that will help your browser render your SCH correctly, so it is rendering according to default non-Latin-script text handling, which if your browser is Chrome I can show you, or if your browser is Safari, I can, um, er, I can ask someone else to show me the current state of unsupported script failover support in Safari and then parrot their response to you, I suppose?)
Copy link to clipboard
Copied
Again, many thanks for the information you shared. I didn't answer some of your questions before and I should have. I am using the legacy HTML export, primarily because the format is simple and easy to work with. The HTML export is just the first step. Eventually the HTML and CSS are combined and there is some scripting to adjust some formatting issues before the HTML is "done". That HTML file is then presented in a mobile application. We have 30 lessons across 4 languages currently . The goal is to be able to deal with any future character sets and fonts. So the goal is to get it to work correctly now so I'll know what direction to go in the future as other languages are added
I am an AI skeptic as well. I only used it just to do a POC to generate simplified text to see if it exported correctly. The actual accuracy of the content wasn't my immediate concern. I just need to come up with suggestions that the translators can implement to solve the issue.
I attached a VERY simple document that has simplified text. Remember, I know little to nothing about using InDesign. I created an new document, copied and pasted simplified text from a Word document, saved it and exported it. The lang attribute as exported is "en-US" and the font applied in css is MHeiHKS-Medium. But, neither of those attributes affects the rendered text. You can remove the css and the lang attribute and it won't affect the rendered text. In fact you can strip everything and just copy the text into the body of essentially a plain vanilla HTML document and it still renders as simplified. So, my guess is that text is set by InDesign and that text is exported as-is. CSS can apply some styling but cannot change the font set. It will still be simplified regardless of font references or lang attribute in the css. MHeiHKS is the applied font as set in the document.
I'll be meeting with the content team this morning and I'll try to communicate these observations to them. I still have more research to do, but I have appreciated your help.