Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

GREP query for Unicode values having a plus sign

Explorer ,
Feb 12, 2019 Feb 12, 2019

How do I write a GREP query for Unicode values that have a '+' in them?

For example,

Unicode: 0BA8 + 0BCD  (this is for one of the Tamil language letters)

I tried writing it like this : [\x{0BA8}] but didn't know how to deal with the '+' sign in the Unicode value.

Can I do a GREP search for this using GID (which is 192 for this particular glyph) instead of Unicode?

Please suggest.

1.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 13, 2019 Feb 13, 2019

Why not use the Unicode value for plus?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 13, 2019 Feb 13, 2019

https://forums.adobe.com/people/Test+Screen+Name  wrote

Why not use the Unicode value for plus?

Because that will search for a '+'.

I think OP is over-thinking this. This is similar to expecting something like "U\+12*" finding all Unicode characters in the range U+1200..U+12FF (hint: it does not. It doesn't work that way).

Just like the official Unicode of a composite ligature such as Minion Pro's "Th" glyph is 'U+0054+U+0068', this does not mean you should see or enter a '+' anywhere -- that is, you can just search for "Th", as it's a notational thing only.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Feb 13, 2019 Feb 13, 2019
LATEST

Ah, I assumed that was exactly what the request was for: search for the three characters U+0BA8, a plus character, then U+0BCD.

Now I look at it more deeply I see the story is this: the characters concerned are U+0BA8 'TAMIL LETTER NA' (ந) and U+0BCD 'TAMIL SIGN VIRAMA' ( ்).  When these appear together in that order U+0BA8 U+0BCD they appear as 'TAMIL CONSONANT N' ந் .

So, indeed, you just search for the two Unicode characters in that order (as if they were two characters), or if possible you search for the actual character ந், allowing the software to care of the composition from two characters.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines