Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
1

Spurious bold or italic formatting from MS Word import

Advisor ,
Oct 12, 2023 Oct 12, 2023

I see a few posts about people losing formatting on import from Word. I'm having the opposite problem. Formatting from Word comes through on import well enough, but I get some extra formatting that's not actually in the Word file. This seems to be happening with what looks to me like cross references (but they're really just text). Text like "As discussed in § 27.02" will sometimes have the section sign and the number in bold or italic as if this was some kind of automatic link formatting. But when I look in the Word file, it's not italicized or bolded, and it's not a cross reference or hyperlink. I've included a short test Word file and Indesign file. I'm importing using the settings below. I'm using ID 17.4.1 x64, but I get the same result if I try it with ID 18. Anyone want to test it? If I could figure out what's in the Word file that's triggering this, I might be able to explain to the client why they occasionally see italics and bold that aren't supposed to be there.

KennethCBenson_0-1697160573950.png

 

TOPICS
Import and export
1.7K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Oct 13, 2023 Oct 13, 2023

Oh, hi Ken! I guess I didn't look at the header at all before replying to your post; I just dug straight into your sample files.

 

Authors do weird things all the time.

 

They do, don't they? <long-suffering sigh>

 

The symptoms you describe (like "TNR for text but Calibri for section signs") sound exactly like what I get when I'm doing a project for some state DOJ or other. I am not sure how they are getting their section signs, but it seems clear that they're not beying keyed per se; perhaps

...
Translate
Community Expert ,
Oct 12, 2023 Oct 12, 2023

I'm not exactly sure how this happened in Word, but I have a decent idea. I placed your Word doc in your InDesign template, and also experiences spurious bolding on the section number. 

 

 

 

bikd.png

 

Not only is your BDBodytext2 style not present in the InDesign file, but your digits are bold and marked as Arabic. Let's look at the same text in Word:

 

COMPLEX.png

 

Your text in Word is in regular TNR if it's normal or Asian, but bold if it's a complex script. Here's where InDesign's Word import filter breaks down; InDesign doesn't let you mark anything with multiple languages. It's either Arabic, or it's not.

 

Now, if you personally open up this sample file in Word and look at the Fonts menu, do you see something like what I see?

 

font.png

 

I don't know what the current state of MS Word is right now, but typically, in order to get those Asian and Complex Script font dropdowns, I have to install some kind of East Asian input method (e.g. a Chinese keyboard) and a "complex script" keyboard (an Arabic keyboard) or those font dropdowns don't appear. However, the settings are in there, just hidden. So I am going to guess that someone in your workflow had a complex script keyboard installed, and maybe keyed in that number? It needn't be someone with an Arabic keyboard; the Word file format has all kinds of little bits that break, sometimes, and while you don't notice it when you are working in Word, InDesign's Word import filter can't figure out what to do with those bits, and so you wind up with spurious bold "Arabic" numerals. 

 

You can fix this by doing cleanup in Word before placing the file. I can think of at least three ways to do it off the top of my head, but this one seems the most reliable. Note that, when I select all of the text, the Reveal Formatting pane shows that there's some complex script bold applied until I "clear local formatting":

 

com.gif

 

And the resultant Word file doesn't have any spurious bolding:

 

nobo.png

 

 

 

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advisor ,
Oct 13, 2023 Oct 13, 2023

Thanks, Joel

I don't have Complex scripts in my Word font menu, probably because I never set up a different keyboard on this computer, but seeing what you're seeing is very helpful. The BDBodytext2 style is not in my Indesign file because it doesn't belong there. Every chapter in this book is written by a different author. Most of them just use Normal with local formatting. Some of them actually use some of the built-in Word styles (like Block Quote). A few of them write their own styles. That's the case here. I don't know why the author made his own Body Text style. Authors do weird things all the time. I frequently get copy with all the section signs in a different font, like TNR for text but Calibri for section signs. Clearing local formatting definitely gets rid of the spurious bold and italic, but it also gets rid of all the other local formatting. This is legal writing, so there are thousands of instances of case names and Latin legalese (like et seq. and Id.) that need to be italic, almost all of them formatted locally. Because of the subject matter, there are plenty of references to CO2, with the 2 subscripted. When they do use a style in Word, it's not going to be the style I use in Indesign anyway. On my end, I use Find/Replace in Indesign to search out all the relevant formatting and lock it down with character and paragraph styles. After preserving their local formatting as styles, I Clear Formatting in Indesign (which gets rid of all the junk formatting, font changes, color changes, size changes, etc.). I can't really change the client's workflow, except maybe to point out to them that someone somewhere is getting these section signs in via an Arabic keyboard.

KennethCBenson_0-1697197326962.png

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 13, 2023 Oct 13, 2023

Oh, hi Ken! I guess I didn't look at the header at all before replying to your post; I just dug straight into your sample files.

 

Authors do weird things all the time.

 

They do, don't they? <long-suffering sigh>

 

The symptoms you describe (like "TNR for text but Calibri for section signs") sound exactly like what I get when I'm doing a project for some state DOJ or other. I am not sure how they are getting their section signs, but it seems clear that they're not beying keyed per se; perhaps they're from the Insert Symbol menu, and they're getting marked by one of the default Calibri-based Word character styles? But that is neither here nor there; you'd need a post-import cleanup method that won't remove all the other local formatting and style overrides that you need to keep, and it sounds like you already have that. 

 

The tweak I'd suggest is that you add "searching for text marked as Arabic" to your text-cleanup strategy. Or perhaps a Javascript like "find anything in this document that isn't marked as English: USA and flag it for review" might be best. I don't often have to deal with dozens of Word-file submissions anymore, but when I do, I usually use a script that I made years ago, modified from Jongware's PrepText. There are a few similar scripts floating around out there, but if you're dealing with many "creative" ways of typesetting a subscript, maybe Kasyan's version might be worth looking at? 

 

I can't really change the client's workflow, except maybe to point out to them that someone somewhere is getting these section signs in via an Arabic keyboard.

 

Well, our clients aren't going to change their workflows for us, right? Especially when the issue is that someone, somewhere once opened a Word doc that had complex scripts used somewhere in it. I once struggled against some persistent Word-cruft for years before I discovered that someone in the local county health department had decided that the Latin-script glyphs from the Windows default Thai UI font were much prettier than Arial or Calibri, and so she used it as her font for all documentation, and therefore 100% of the Word files from her were filled with spurious complex script information. 

 

 

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Advisor ,
Oct 14, 2023 Oct 14, 2023
LATEST

I think this will be my solution:

The tweak I'd suggest is that you add "searching for text marked as Arabic" to your text-cleanup strategy.

Thanks for pointing out that these are marked Arabic. Somehow I had managed to ignore that detail.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines