Unicode Text Not Encoding Properly
- September 20, 2019
- 11 replies
- 33887 views
Hello,
When I convert a Microsoft Word document to PDF (using either the Adobe Printer or Save As > PDF), the Unicode text in the document does not encode properly. The text _appears_ correct, but when I copy and paste it to another program, such as Notepad, there are errors:
For example, the text in Microsoft Word is
बूढ़े पिता ने ऐसी भूमिका ...
The text _appears_ correct in the PDF file, but when I copy and paste it Notepad, it is:
बूढ़े पता ने ऐसी भूमका ...
Note the boxes.
This erroneous encoding means it is not possible to search the PDF document properly. For example, if I search for "पिता" (the second word), I will not get a match. I would have to search "पता".
Embedding the fonts makes no difference.
I have attached a Word and PDF file for reference.
Please advise how I can create a PDF properly, so that the encoding is exactly like it is in any other program, i.e. without any of these boxes or other issues.
Thank you
Sim
