Skip to main content
rt76137857
Participant
June 22, 2016
Question

Missing Characters when you create PDF from MS Word Document in Arabic Language via Adobe Acrobat or SAVE AS PDF in MS WORD

  • June 22, 2016
  • 1 reply
  • 1057 views

Folks

We are getting the following issues [and i m non-native arabic person who can't read or write arabic but can understand unicode characters] and the steps are as follows:

1. Create a new MS Word Document.

2. Copy & Paste the following string [as an example] - "الأحكام والشروط"

3. The font is standard Arial and Size 10.

4. Save As the file as PDF - which is available as a standard functionality in MS Word or use Adobe Acrobat to convert the MS Word file into PDF

5. Open the PDF File and try selecting the text.

The following are the issues:

1. In MS Word - all the characters are proper unicode and the unicode for the above string is:

Unicode Description
627 LETTER ALEF
644 LETTER LAM
623 LETTER ALEF WITH HAMZA ABOVE
062D LETTER HAH
643 LETTER KAF
627 LETTER ALEF
645 LETTER MEEM
20 SPACE
648 LETTER WAW
627 LETTER ALEF
644 LETTER LAM
634 LETTER SHEEN
631 LETTER REH
648 LETTER WAW
637 LETTER TAH

2. When we open the PDF File created by MS Word and we do CTRL+A [Select all Text] and look at the text copied, the unicodes are as follows:

Unicode Description Remarks
627 ARABIC LETTER ALEF  
644 ARABIC LETTER LAM  
623 ARABIC LETTER ALEF WITH HAMZA ABOVE  
062D ARABIC LETTER HAH  
643 ARABIC LETTER KAF  
627 ARABIC LETTER ALEF  
645 ARABIC LETTER MEEM  
20 SPACE  
648 ARABIC LETTER WAW  
627 ARABIC LETTER ALEF  
627 ARABIC LETTER ALEF Original Unicode was 644
634 ARABIC LETTER SHEEN  
631 ARABIC LETTER REH  
648 ARABIC LETTER WAW  
627 ARABIC LETTER ALEF Original Unicode was 637

You can see - that when the MS Word Document was saved as PDF or PDF Created by Adobe Acrobat- there were certain characters - which get replaced automatically and is a loss of data as text in the concerned PDF File.

If you visually see the PDF File - every thing in terms of characters seems to be same.

 

Based on a 20 page document we had - when we compare the original MS Word characters with corresponding text extracted from PDF via copy paste - we get about 17% replacements. We can't identify any pattern in the same.

Request support for the above.

This topic has been closed for replies.

1 reply

Zaid Al Hilali
Community Expert
Community Expert
January 20, 2017

Try another Arabic font in MS. Word, perhaps your Arial version is causing this failure. I suggest you use a font from a reputable vendor though.