Copy link to clipboard
Copied
Folks
We are getting the following issues [and i m non-native arabic person who can't read or write arabic but can understand unicode characters] and the steps are as follows:
1. Create a new MS Word Document.
2. Copy & Paste the following string [as an example] - "الأحكام والشروط"
3. The font is standard Arial and Size 10.
4. Save As the file as PDF - which is available as a standard functionality in MS Word or use Adobe Acrobat to convert the MS Word file into PDF
5. Open the PDF File and try selecting the text.
The following are the issues:
1. In MS Word - all the characters are proper unicode and the unicode for the above string is:
Unicode | Description |
627 | LETTER ALEF |
644 | LETTER LAM |
623 | LETTER ALEF WITH HAMZA ABOVE |
062D | LETTER HAH |
643 | LETTER KAF |
627 | LETTER ALEF |
645 | LETTER MEEM |
20 | SPACE |
648 | LETTER WAW |
627 | LETTER ALEF |
644 | LETTER LAM |
634 | LETTER SHEEN |
631 | LETTER REH |
648 | LETTER WAW |
637 | LETTER TAH |
2. When we open the PDF File created by MS Word and we do CTRL+A [Select all Text] and look at the text copied, the unicodes are as follows:
Unicode | Description | Remarks |
627 | ARABIC LETTER ALEF | |
644 | ARABIC LETTER LAM | |
623 | ARABIC LETTER ALEF WITH HAMZA ABOVE | |
062D | ARABIC LETTER HAH | |
643 | ARABIC LETTER KAF | |
627 | ARABIC LETTER ALEF | |
645 | ARABIC LETTER MEEM | |
20 | SPACE | |
648 | ARABIC LETTER WAW | |
627 | ARABIC LETTER ALEF | |
627 | ARABIC LETTER ALEF | Original Unicode was 644 |
634 | ARABIC LETTER SHEEN | |
631 | ARABIC LETTER REH | |
648 | ARABIC LETTER WAW | |
627 | ARABIC LETTER ALEF | Original Unicode was 637 |
You can see - that when the MS Word Document was saved as PDF or PDF Created by Adobe Acrobat- there were certain characters - which get replaced automatically and is a loss of data as text in the concerned PDF File.
If you visually see the PDF File - every thing in terms of characters seems to be same.
Based on a 20 page document we had - when we compare the original MS Word characters with corresponding text extracted from PDF via copy paste - we get about 17% replacements. We can't identify any pattern in the same.
Request support for the above.
Copy link to clipboard
Copied
Try another Arabic font in MS. Word, perhaps your Arial version is causing this failure. I suggest you use a font from a reputable vendor though.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now