InDesign Grep Style for Composite Fonts

Report · Jun 13, 2023

Hi,
Is it possible to use Grep Style to do composite fonts, the base is Chinese or Japanese and Latin and Number is DinPro? or

How to set a Grep Style if I type in latin letters A-Z or 1-10 and it automatic switch to DinPro font?

Thanks

Report · Jun 13, 2023

Hi @iampurple, I don't have any experience with what you are asking, but in case it helps, here is a little test I just did, that might be worth playing with. I set up a paragraph of chinese text with a grep style that targets specific latin characters and draws them in DIN cyan color.

The grep I used is this:

([A-Za-z0-9,;\.\?\!]+\s?)+

which isn't great, because it only targets the explicitly included characters, so you will have to add any other characters you want to target, such as hyphen, dollarsign, em-dash, accented characters, eg. öà. (I tried targeting a unicode code block eg \p{Han} but couldn't get it working—maybe a grep expert will chime in on that.)

See attached demo document.

- Mark

Edit 2023-06-14: added Japanese and Korean examples to demo.indd.

Edit 2023-06-14: added a couple of extra unicode blocks to demo.indds. Thanks to @Joel Cherney for the idea.

Edit 2023-06-15: now I'm using this grep in my demo.indd:

[\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}\x{0A}\x{2000}-\x{206F}]+

Report · Jun 13, 2023

Thanks a lot Mark!! Yes, this is, but what if the document has multiple languages, Japanese, Korean, Thai...etc? 5 pages are Chinese, another 5 pages are Japanese?

Do we set a paragraph style per languages that include the grep style? or we can do script? or?

Many thanks

Report · Jun 13, 2023

Well, because this grep targets only specific latin characters

[A-Za-z0-9,;\.\?\!]

then it should work the same in Japanese as with Chinese. I would set up a "Master Grep" paragraph style that had the grep style in it, and then create another paragraph style for each language (Chinese, Japanese, etc) that is based on that style. This is wise because you will almost certainly want to tweak the grep to include other latin characters and you don't want to have to edit every language's paragraph style!

- Mark

(I have added a few languages to the demo.indd file attached to my answer above.)

Report · Jun 13, 2023

How do you setup a Master Grep paragraph style?

Many Thanks

Report · Jun 13, 2023

Have a look at my updated demo.indd attached to my answer above. It is just a paragraph style that includes the grep style. And each "language" style is based on that master style. Nothing special. It just means that if you need to tweak the grep, you just change it in the master grep style.

- Mark

Report · Jun 13, 2023

I see, thanks a lot for helping Mark!!

Report · Jun 13, 2023

Mark has pretty much nailed this already, but i will point out that i usually achieve the same effects by using regular expressions syntax to specify Unicode ranges. For example, "every glyph past basic ASCII" would be

[/x{0100}-/x{FFFF}]+

You can make a GREP style that applies to e.g. only Thai Unicode ranges. It can get complicated, as in the case of parentheses in CJK text that are not fullwidth, but it works quite well to capture All the Glyphs in a given writing system.

Report · Jun 13, 2023

That's the sort of thing I was looking for Joel! Thanks. Also, I notice that Indesign grep can understand \p{Punctuation} which is great, but not other unicode blocks, eg. \p{Han}.

@iampurple you can implement the grep with Joel's method by changing the grep to something like:

([\x{0020}-\x{024F}]+\s?)+

(By the way, see unicode blocks here.)

You may still have to go more sophisticated with the grep though, if you don't want to match some things, for example Korean seems to use some latin punctuation characters:

Report · Jun 13, 2023

Thanks a lot @Joel Cherney and @m1b, I was trying to Grep style for some unicode, like Registration Mark, Copy Right, Carat, Trademark Symbol...etc. but thinking probably the all Glyphs.

thanks

Report · Jun 13, 2023

Hi @m1b and @Joel Cherney

Is all the font has the same Unicode number (Latin), the custom font? Not sure why the Trademark ™ isn't show a cyan color, it's black?

Thanks

Report · Jun 14, 2023

Hi @iampurple, this is all new to me, but it seems that the trademark ™character isn't in the unicode block we specified. Looking at this list, we can see that it is part of "Letterlike Symbols". We can add it to the grep:

([\x{0020}-\x{024F}\x{2100}-\x{214F}]+\s?)+

While we are here, I noticed that there is an "Latin Extended Additional" block that includes a lot of latin with diacritics. If we add those too we have:

([\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}]+\s?)+

- Mark

Report · Jun 14, 2023

Thanks a lot Mark!! Appreciated.

Report · Jun 14, 2023

You're 100% correct Mark, you do have to be a bit more sophisticated. I personally try to keep languages in separate documents, in order to avoid collisions like the bullet character you pointed out. But sometimes my client wants 40 languages in live text in a single document, so I have to be sparing with my use of GREP styles, because I don't want any styling collisions. Also, having 40 different GREP styles running all the time can be a performance hit, if your hardware isn't particularly recent.

So, some caveats:

Spaces are weird. If you're setting type in Khmer or Amharic, I find that spaces ought to be quite wide for those languages. However, if you look at their Unicode values, they're usually still using U+0020, the Boring Normal Space. Some Amharic input methods actually insert "Ideographic Space" U+3000, which is space you'd use if you were doing fullwidth layout in Chinese. Other Amharic input methods use other spaces that aren't quite as wide as the ideographic space. Some Amharic fonts just have an extra-wide space encoded at 0020. On the other side, lots of Khmer and Lao text uses "Zero-With Space", U+200B, which is exactly what it looks like, it allows line-break at words with zero visible space, which is important for many SE Asian Languages. There are lots of spaces!

Sorry for the info dump there, but the main point is that \s captures all of these. Or, maybe I should test it? It certainly captures at least most of these special-use spaces. So be wary of just including \s in any regex meant to apply specifically to a single language.

Additionally... if you have both Chinese and Japanese in the same document, you'll need to be careful, as some of the glyphs are encoded at the same point but drawn differently. It's important to use only Japanese-style glyphs when typesetting Japanese, but if you're capturing all the Simplified Chinese glyphs with one regex, a Japanese-specific search will often capture the exact same glyphs. Using Simplified mainland glyphs for Japanese is the acme of Things Not Done When Typesetting Japanese, I think.

Finally, some fonts will do things differently if they are marked with a different language in the InDesign interface - different glyph alternates or glyph shaping, which is quite often necessary if you're doing work into a bunch of right-to-left languages in a single doc. So, more imporant to me than capturing individual glyphs with a GREP, is marking text with appropriate language in the paragraph style. So I'll have a English paragraph style, and then a whole bunch of other languages Based On the parent English style, with their own language settings and GREP styles.

Report · Jun 14, 2023

Hi @Joel Cherney, these are great points!

Given the description of the OP's job, with multiple languages in the same document, I guessed that targeting the Latin characters might be simplest, which is what my grep does. This turned out to be handy because the same grep can be used for (hopefully!) paragraphs containing glyphs of any non-latin language. As you point out, it does mean that the grep will need to be improved if it is unwantedly matching characters amongst the non-latin glyphs. I will explore that a little for the OP in another reply I think.

As for the space, you make an excellent point—matching with the general space \s seems unwise in this context, even though my grep only matches a whitespace character if it comes after matched latin text. I'm thinking just removing the \s? because I'm already matching a normal "latin" space u+0020. I also need to add some better punctuation matching and a forced linebreak u+000A. I will alter the basic grep used in my demo.indd above to this:

[\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}\x{0A}\x{2000}-\x{206F}]+

The issue of matching Chinese and Japanese characters in the same document is great to know, but doesn't apply here I think because we aren't matching the non-latin characters—only the latin characters. It will be up to the OP to choose a suitable font (ie. Chinese or Japanese) that draws the correct glyph for the language I guess.

As for setting the language in the paragraph style—yes! Absolutely a must I would think. However, I can't test this because my version of Indesign doesn't show me Chinese, Japanese or Korean on the list of languages.

Thanks for sharing your expertise. You've improved this answer greatly and I've learned a lot.

- Mark

Report · Jun 14, 2023

@iampurple here are some ideas if the grep is matching problematic characters in the non-latin text:

Option 1. Only match two or more latin characters:

[\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}\x{0A}\x{2000}-\x{206F}]{2,}

This will solve the punctuation matched in the Korean in my screenshot above, but means that, say, a single digit won't be matched, so that may not be a viable option, depending on the expected latin text.

Option 2. Only match latin text only after punctuation in the non-latin text:

(?<=\p{Punctuation})[\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}\x{0A}\x{2000}-\x{206F}]+

This could work if the latin text is inserted always after punctuation, and in cases where you want some latin text, eg. digits, to be kept in the non-latin-text font:

Option 3. Only match any text between special characters. In this case you must insert a special character before and after the latin text. This is a more flexible, but manual approach. You would have to choose a special (invisible) character that wasn't used in any of the non-latin languages (we can't use zero-width-space for example because as Joel says it is used by some languages to control line breaking). I won't elaborate on this option because it may not apply to your case.

If none of those seem like they will fix the issue, then we may need to use a more sophisticated grep.

- Mark

Report · Jun 14, 2023

Thanks a lot @m1b and @Joel Cherney!!

I want the purctuation match with the languages when is copy and paste it in InDesign, no need to change there, but there is a punctuation match with the Latin is when I use URL www.loremipsume.com. And the number is always switch to Latin even though is in copy and paste it over.

I tried to combined your grep style, would you please take a look if it right. I want the punctuation match with it languages:

(?<=\p{Punctuation})[\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}]+\s?+

Many thanks you two for great thoughts!!

Report · Jun 26, 2023

Hi @m1b and @Joel Cherney

Would you please take a look if this grep is right?

(?<=\p{Punctuation})[\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}]+\s?+

Is this grep "(?<=\p{Punctuation})" for all Punctuation in English only?

Many Thanks

Report · Jun 26, 2023

I have never used \p myself, but I just tested it and no, \p{Punctuation} seems to catch all forms of punctuation, Latin-script or no. Here's a little demo GIF for you, of that \p{Punctuation} catching almost all the punctuation in mixed English and Trad Chinese (mixed in with lots of gibberish punctuation that I added by clicking on the Glyphs panel without any real plan):

Note that it's catching both fullwidth Chinese parentheses as well as normal Latin-script parentheses. The one exception was the one punctuation glyph that didn't have a Unicode ID:

Report · Jun 26, 2023

Thanks a lot for testing it out @Joel Cherney !!

I think probably using this grep from Mark above given, any Latin words+number+(some speical characters: > @trade market symbols) will be using the English:

([\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}]+\s?)+

Should I remove the space \s?

Thanks

Report · Jun 26, 2023

Well, I'm not trying to be evasive here, but I'd go back to my "spaces are complicated!" comment above on that particular question.

In an Arabic document destined for use within the United States, I want all my spaces to be overtly marked as Arabic right-to-left spaces, unless it's a bunch of parenthetical English. So, right there, you can see a case where I can't just set up a GREP style to automatically apply an English LTR character style to all spaces. I can have that be the default in the paragraph style, but I'd wind up having a secondary manually applied character style for parenthetical English, and for phone numbers and other things that should behave in an Englishy LTR manner.

In a Tigrinyan or Amharic document, I'd want to respect the wider spaces that are traditional in those scripts, so I'd want to exclude spaces from my regex, in those Ethiopic writing systems.

However, you've already specified the "generic" ASCII space at 0020 in your regex, so I personally would remove the \s from my query, as the \s catches all of the not-totally-standard spaces like the fullwidth ideographic space that I would most certainly want to leave in the non-DIN font.

Report · Jun 26, 2023

I agree with @Joel Cherney about removing the \s, as I have done in my demo document. That should be the starting point for the grep and from there I think you will need to start testing with real examples.

- Mark

Report · Jun 26, 2023

Thanks a lot @m1b and @Joel Cherney!!! Very appreciated.

I'll test it out.

Report · Dec 22, 2023

Dear all,

I am a novice in the GREP but I had the same idea as you!

I want to create "one" Paragraph Style that manage Arabic script and Latin script (English) in the same time.

All Arabic text should use a font (Adobe Nask Regular), I did a Character style for Arabic.

All Latin text should use another font (STIX Two Regular), I did a another Character style for Latin (English).

Numbers: (STIX Two Regular)

Special Characters: (STIX Two Regular)

I put the expression for Arabic script:

[\x{0600}-\x{06FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}\x{0750}-\x{077F}\x{08A0}-\x{08FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}]+

I put the expression for Latin script:

([\x{0020}-\x{024F}\x{1E00}-\x{1EFF}\x{2100}-\x{214F}]+\s?)+

But the Paragraph Style does not work. Do you have an idea please?

TIA

Report · Dec 22, 2023

Can you tell us how it doesn't work?

I usually set up a paragraph style in the "main" language, and a character style for the "parenthetical" language. I don't know anything about what you're setting up, but that's what I usually do. So the para style is set to Arabic language, using uhhh FF DIN Arabic, and all of the Arabic settings I'd need. Then I'd make a character style with English language, and apply that with a GREP Style. My Latin-script GREP is pretty limited, and it doesn't include the space. I want spaces to inherit "default" directionality behavior from the parent paragraph, and I can apply the Latin-script style to contiguous runs of EN text if I need to. But InDesign is smart enough to know, usually, that a space between two Latin-script words that are marked as English is supposed to be a LTR space.

So, I can't tell from looking at your description what went wrong for you, but if you can post more details, and maybe a sample file, I'm sure we can figure it out.

InDesign Grep Style for Composite Fonts

3 Correct answers