• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

GREP to find a Greek Phrase

Engaged ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

I'd like to find a phrase of Greek letters within non-Greek text. This following GREP uses the "or" operator, the two main Greek Unicode sets and a third grouping that includes a space and some punctuation:

([\x{0370}-\x{03FF}]|[\x{1F00}-\x{1FFE}]|[ ,­’])+

This works except that: 1, It finds spaces that aren't within Greek letters, and 2, It selects the spaces on either side of the Greek phrase.

 

I'd like it to NOT find spaces that don't have a Greek character on either side of them and I don't want it to select the spaces before and after the phrase.

 

So I don't want this selection:

Capture.JPG

But I'd like this selection:

Capture.JPG

 

The goal is a search and replace to apply the Greek language, and it's important that the spaces and punctuation within the Greek phrase also get the lanugage set.

 

Is this even possible?

 

Thanks for any insight,

Ken

TOPICS
How to , Type

Views

285

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Expert , Dec 05, 2024 Dec 05, 2024

Tricky one. You can start with a Greek character, then proceed matching Greek characters and additional things like space, comme, etc., ending with a Greek character:

 

[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}]
[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE} ,']+
[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}]

 

Note that you can include more than one range in a class, as in the above expression.

 

You need to write those threee lines as one line in the Find/Change window, which is a tight place to be in. For complex expre

...

Votes

Translate

Translate
Guide ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

Thanks to post an .idml file with fonts!

 

(^/)  The Jedi

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

No, sorry, propriotary fonts. Not really needed, any Unicode font with Greek would do.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

quote

No, sorry, propriotary fonts. 


By @KenWK

 

You mean the same font is used for Greek and English? 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Dec 13, 2024 Dec 13, 2024

Copy link to clipboard

Copied

Yes, it is. The beauty of Unicode.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

I honestly don't know if it's possible. Every once in a while, I poke at the problem, trying to figure out a good regular expression to get it in a single pass. It's best dealt with for me (as a localization nerd) upstream, by the translator. But from day to day, I usually can't go back to my client's client's translation provider and ask 'em to do their text input differently, or to configure their translation environments differently.

 

Instead, in InDesign, I just do multiple passes with regex. The first pass looks for certain Unicode ranges, and marks the Greek characters (or Chinese, or Burmese, or whatever) with a given language, and perhaps character style. The important bit here is that we're only capturing overtly Greek etc. characters. For the second pass, the actual search query specifies the more ambiguous codepoints - such as spaces and punctuation - but with positive lookbehind specifying overtly Greek characters. This can, in some circumstances, get a false match, so I'm not likely to Replace All when I'm searching for boundary cases (such as Greek sentences with colons and semicolons and parentheses that may have either Greek or English contents). I will very often do a similar search for any two spaces or bits of punctuation following a Greek glyph, as there's often a few of those in any document with a sufficient number of parenthetical asides. 

 

You can get fancy with a script like FindChangeByList, which will perform a series of regex find/replace actions unattended, but for the reasons mentioned above, I'd personally rather just devote the minutes to grepping through the document manually, usually from the keyboard, whacking alt-N to go to the next match, and either changing it with alt-H or movin on to the next match. I'd rather personally sign off on each change, given that it's possible for a weird punctuation choice (parenthetical statements separated by ellipses, perhaps? idiosyncratic translators deciding that a solidus is mutually exchangable with a slash?) might make it past my hand-woven net of clunky regular expressions. 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

quote

I honestly don't know if it's possible. 

 

<sighs and gestures weakly at the screen> I mean, with pure regular expressions. It's actually something that could be done easily in a single script. You'd need to <long siiiiiigh> be some kind of Jedi if you were going to pull it off with pure regex. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

Thanks for the entertaining answer!! Pretty much what I was going through before I posted. XD 

 

We have a special swatch we use to "highlight" things when dealing with things like this, and software to run multiple saved search and replaces, which is what I'll likely end up doing with each search and replace highlighting the changes. Then have a manual look through.

 

Thanks again

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

I really think Peter Kahrel's answer below is your one-pass solution. I'm already using it, for what it's worth - I have this Amharic translation, and the translator wants normal narrow space in between Latin text, and traditionally wider Amharic spaces between Amharic words, and I've completely replaced my multi-pass system with Peter's solution.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

Tricky one. You can start with a Greek character, then proceed matching Greek characters and additional things like space, comme, etc., ending with a Greek character:

 

[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}]
[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE} ,']+
[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}]

 

Note that you can include more than one range in a class, as in the above expression.

 

You need to write those threee lines as one line in the Find/Change window, which is a tight place to be in. For complex expressions like these I use a script with a more generous window to write GREP expressions in:

PeterKahrel_0-1733436610600.png

The script is here: https://creativepro.com/files/kahrel/indesign/grep_editor.html

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

Right, anything that begins or ends with a Greek character, that's very clever! 

 

When I copied your query and pasted it into InDesign, it included carriage returns:

[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}]\r
[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE} ,']+\r
[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}]

Those "\r"s kill the match, of course. And I needed rather more punctuation to actually get a match (punctuation that I escaped with a leading backslash out of habit, I assume that they're only necessary in front of the parentheses):

[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}][\x{0370}-\x{03FF}\x{1F00}-\x{1FFE} ,'\(\)\:\;]+[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}]

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

I assume that they're only necessary in front of the parentheses

 

Not for parentheses, but brackets and backslashes, yes.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

Yowza, that's amazing! Thanks Peter, I'll start playing around with that. What a great idea with the script!

 

Thanks again

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Dec 13, 2024 Dec 13, 2024

Copy link to clipboard

Copied

LATEST

I ended up with this, which seems to get most things for me:

[“‘<\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}]
[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE};'  ,.<>]+
[\x{0370}-\x{03FF}\x{1F00}-\x{1FFE}>’”]

One thing it doesn't do is get single Greek characters and instances like two Greek characters followed by a hyphen. But, a second search of just any Greek characters finds those few things easily enough.

 

Thanks again, Peter!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

@KenWK

 

Can you share a sample document? 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

Unfortunately, I can't. Thanks for answering though.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 05, 2024 Dec 05, 2024

Copy link to clipboard

Copied

@KenWK

 

Do you have your Greek texts always in between "said:" and "(99.9.9)"? 

 

If you work on Windows you could use free version of my ID-Tasker tool - or I can give you access to the full version to play for free. 

 

Then you could run multiple smaller searches and combine results. 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines