Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Missing Characters when you create PDF from MS Word Document in Arabic Language via Adobe Acrobat or SAVE AS PDF in MS WORD

New Here ,
Jun 22, 2016 Jun 22, 2016

Folks

We are getting the following issues [and i m non-native arabic person who can't read or write arabic but can understand unicode characters] and the steps are as follows:

1. Create a new MS Word Document.

2. Copy & Paste the following string [as an example] - "الأحكام والشروط"

3. The font is standard Arial and Size 10.

4. Save As the file as PDF - which is available as a standard functionality in MS Word or use Adobe Acrobat to convert the MS Word file into PDF

5. Open the PDF File and try selecting the text.

The following are the issues:

1. In MS Word - all the characters are proper unicode and the unicode for the above string is:

Unicode Description
627 LETTER ALEF
644 LETTER LAM
623 LETTER ALEF WITH HAMZA ABOVE
062D LETTER HAH
643 LETTER KAF
627 LETTER ALEF
645 LETTER MEEM
20 SPACE
648 LETTER WAW
627 LETTER ALEF
644 LETTER LAM
634 LETTER SHEEN
631 LETTER REH
648 LETTER WAW
637 LETTER TAH

2. When we open the PDF File created by MS Word and we do CTRL+A [Select all Text] and look at the text copied, the unicodes are as follows:

Unicode Description Remarks
627 ARABIC LETTER ALEF  
644 ARABIC LETTER LAM  
623 ARABIC LETTER ALEF WITH HAMZA ABOVE  
062D ARABIC LETTER HAH  
643 ARABIC LETTER KAF  
627 ARABIC LETTER ALEF  
645 ARABIC LETTER MEEM  
20 SPACE  
648 ARABIC LETTER WAW  
627 ARABIC LETTER ALEF  
627 ARABIC LETTER ALEF Original Unicode was 644
634 ARABIC LETTER SHEEN  
631 ARABIC LETTER REH  
648 ARABIC LETTER WAW  
627 ARABIC LETTER ALEF Original Unicode was 637

You can see - that when the MS Word Document was saved as PDF or PDF Created by Adobe Acrobat- there were certain characters - which get replaced automatically and is a loss of data as text in the concerned PDF File.

If you visually see the PDF File - every thing in terms of characters seems to be same.

 

Based on a 20 page document we had - when we compare the original MS Word characters with corresponding text extracted from PDF via copy paste - we get about 17% replacements. We can't identify any pattern in the same.

Request support for the above.

TOPICS
Acrobat SDK and JavaScript , Windows
997
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 19, 2017 Jan 19, 2017
LATEST

Try another Arabic font in MS. Word, perhaps your Arial version is causing this failure. I suggest you use a font from a reputable vendor though.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines