Searching for non-arabic letters in a mixed document

Question

Hi all,I am curious if someone has an idea if it is possible to specifically search for non-arabic letters in a language mixed document.The thing is that throughout the whole Arabic text (almost 70 pages) single words and whole sentences are written with Roman letters. But those - although having the same font size like the Arabic letters - appear much bigger. Of course I could manually search for those cases and apply another character style but this is not efficient. I thought of using GREP search and replace but I couldn't figure out how to do it. Can anyone give instructions what to do or offer another way to achieve what I need?Many thanks in advanceMona

TᴀW · Accepted Answer

The first thing to do is check the language setting that is applied to the text. If you're lucky you'll find that the Arabic text has Arabic as its applied language, and the non-Arabic text has a different language setting (English-US or similar). This happens because Word by default automatically applies the correct language to text.

If not, though, a GREP search is probably the best way. Searching for

[\u\l]

is a start, especially if we are only dealing with English letters. If there could be accented characters from other languages, a more inclusive GREP would be needed.

Even with a GREP search, I recommend going through the founds results one by one and not clicking on Change All, because you will probably want to mark spaces and punctuation that belongs to the English text as English as well, not just the letters.

Ariel

David W. Goodrich · Answer

I routinely use a GREP search to find strings of CJK characters so I can apply a character style that includes the font I want:

[\x{2E80}-\x{9FBB}]+

This finds all strings of chars. encoded with hexadecimal values between between 2E80 and 9FBB, and perhaps you can swap in one or more ranges that work for Arabic. I assume Arabic includes word-spaces, so you may need a separate GREP to find spaces between strings of Arabic and apply the language attribute and font (perhaps by means of a char. style). I have no idea how well GREP searches text running right-to-left -- hopefully just fine.

My CJK search string isn't perfect. It leaves out some stuff, including full-width punctuation and compatibility forms up at the top of Unicode's first plane, and of course doesn't get any CJK added in the second. Nor can it distinguish C, J, and K, so I do that manually (you cannot rely on applied fonts -- I recently received Chinese files where most chars. had a Japanese font applied).

Good luck!

David

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded