How to select Arabic wtih Harkaats (diacritics) only?
Hi all,
I have a huge text mixed with Arabic text with and without the diacritics. Below is the sample text:
==== Sample Text ===
سورۂ آل عمران کی آیت ۱۰۳ میں دشمنوں اور خون کے پیاسوں کے دلوں میں محبت و الفت کا جذبہ پیدا کرنے والے نے بطور احسان فرمایا ہے۔
وَاذْکُرُوْا نِعْمَتَ اللّٰہِ عَلَیْکُمْ اِذْ کُنْتُمْ اَعْدَآئً
فَاَلَّفَ بَیْنَ قُلُوْبِکُمْ فَاَصْبَحْتُمْ بِنِعْمَتِہٖ اِخْوَانًا ج
’’اللہ کے اس احسان کو یاد کرو جو اس نے تم پر کیا ہے۔ تم ایک دوسرے کے دشمن تھے، اس نے تمہارے دل جوڑ دئیے اور اس کے فضل و کرم سے تم بھائی بھائی بن گئے۔‘‘
=== End ===
I want to select only the Arabic text. How could I do that with grep ? Actually the above text contains Urdu and Arabic matter. Issue is that both Urdu and Arabic are using the same Unicode values to its difficult to tackle it.
I did write something long time back but I need something simpler. Here is what I wrote almost 2 years back
======== my code (Its a single code. Used in Grep Style and Find Replace) =========
(\w+\p{mn}\w*){2,}\x{0020}{0,}|\b\x{0641}\x{0650}?[\x{06cc}|\x{0649}]\x{0652}?\b|\b\x{0644}\x{064e}\x{0622}\b|\b\x{0644}[\x{064e}|\x{0652}]?\x{0627}[\x{064e}|\x{0653}|\x{0670}]?\b|\b\x{0644}\x{0651}\x{064e}\x{0627}[\x{0653}]?|\b\x{0644}\x{0651}\x{064b}\x{0627}\b|\b\w[\x{064e}-\x{0650}]\x{0644}\x{0627}\b|\b\x{0648}[\x{064e}-\x{0650}\x{0652}]\b|\b\x{0648}\x{0651}[\x{064e}-\x{0650}\x{064b}-\x{064d}]\b|\b\x{06c3}[\x{064e}-\x{0650}]\b|\b\x{06c3}[\x{064b}-\x{064d}]\b|\b\x{06c1}[\x{064f}\x{0650}\x{0656}\x{0657}]\b|\x{0020}?\x{0627}?\x{0644}\x{0644}\x{06c1}[\x{064e}-\x{0650}]\b|\b[\x{06d6}-\x{06ed}]\b|\x{0600}|\x{06dd}
========================================================
A simple solution would be very much helpful.
<Title renamed by moderator>


