• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Invisible characters in Bengali Unicode text hindering correct display

New Here ,
Nov 03, 2024 Nov 03, 2024

Copy link to clipboard

Copied

I'm an experienced InDesign & GREP user, and experienced in Bengali typesetting. But I've encountered an issue I can't solve. I have some Unicode Bengali text (converted from ANSI encoding) with some invisible characters hindering the correct formation of the characters. When I cut & paste the problem word into Find/Change, it renders it as ~I in GREP (or ^I in 'Text'). But when I search for either of these characters, or even for the text I just copied, it cannot find it. When I copy the offending text into another software and then cut & paste it back, the offending characters are removed. My questions are:
1. What are these invisible problematic ~I/^I characters?
2. How can I remove them with InDesign's Find/Change?
I've included a screenshot here and attached a sample file.

jacobt63309185_0-1730696048265.png

 

TOPICS
Type

Views

160

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 04, 2024 Nov 04, 2024

Copy link to clipboard

Copied

The character is a strange beast. It's used for the index marker, text anchor, and a few other things. When you select it and open the Info panel you'll see that its Unicode value is FEFF. In the text tab you can find it by searching for <FEFF>. Unfortunately it's not possible to find it in the Grep tab. \x{FEFF} won't find it, for example.

 

In the story editor you'll notice that it looks like a text anchor, but that symbol you see there could be used for other purposes as well, I forget.

 

PeterKahrel_0-1730713634672.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 06, 2024 Nov 06, 2024

Copy link to clipboard

Copied

Thanks, Peter, that was just the reply I was hoping for! Very helpful.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 06, 2024 Nov 06, 2024

Copy link to clipboard

Copied

LATEST

To expand on Peter's answer - outside of InDesign, in the broader world of Unicode, that FFEF is a zero width no-break space. I think it is an artifact of the conversion process.... Seems to me that it should be a different character that is actually used in Bengali Unicode all the time, 200C, the zero width non-joiner. The 200C ZWNJ does what it says on the tin - it keeps two glyphs that would ordinarily connect via ligature from either connecting or breaking at that point.

 

Sometimes I myself have to use convertors to get old complex-script text into Unicode, but honestly I don't trust them unless I write them myself. You never know what is going to creep in. In this case, what crept in is also the code point that InDesign uses internally for a bunch of stuff. If possible, in your shoes I would re-convert and intervene to replace the (IMHO wrong) zero width no-break space with the zero-with non-joiner. Alternately, I do know that someone has at some point written a script to find it, although I don't have a bookmark to it immediately to hand.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines