Copy link to clipboard
Copied
Hello,
When I convert a Microsoft Word document to PDF (using either the Adobe Printer or Save As > PDF), the Unicode text in the document does not encode properly. The text _appears_ correct, but when I copy and paste it to another program, such as Notepad, there are errors:
For example, the text in Microsoft Word is
बूढ़े पिता ने ऐसी भूमिका ...
The text _appears_ correct in the PDF file, but when I copy and paste it Notepad, it is:
बूढ़े पता ने ऐसी भूमका ...
Note the boxes.
This erroneous encoding means it is not possible to search the PDF document properly. For example, if I search for "पिता" (the second word), I will not get a match. I would have to search "पता".
Embedding the fonts makes no difference.
I have attached a Word and PDF file for reference.
Please advise how I can create a PDF properly, so that the encoding is exactly like it is in any other program, i.e. without any of these boxes or other issues.
Thank you
Sim
Copy link to clipboard
Copied
I had something that sounds similar. I had English in the PDF but jibberish when i copied and pasted. So I exported the pages as images (to JPEG) files. Each page unfortunately became a separate file. Then i pulled them all back in using the Combine Files button. They came back in and the OCR automatically read them. When i copied and pasted all was fine. I think the author of the original file used a custom font and coding that was unknown to the program i was pasting to. Not sure how good the OCR is in Asian languages though.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Hey Madhuris,
Just a quick follow up and checking if you were able to resolve your issue.
Thank you.
Copy link to clipboard
Copied
I had something that sounds similar. I had English in the PDF but jibberish when i copied and pasted. So I exported the pages as images (to JPEG) files. Each page unfortunately became a separate file. Then i pulled them all back in using the Combine Files button. They came back in and the OCR automatically read them. When i copied and pasted all was fine. I think the author of the original file used a custom font and coding that was unknown to the program i was pasting to. Not sure how good the OCR is in Asian languages though.
Copy link to clipboard
Copied
Thank you for updating this old thread with that solution.
I learned something new today.
Copy link to clipboard
Copied
Hi ,
I am trying to convert the Marathi pdf document to the word document.
It changes the font to incorrect characters as below:
किव कु लगु^ कािलदास सं^ृ त िव^िव^ालय, रामटेक, नागपूर
Could you please help me to know what needs to be done to keep the correct format? Thanks in advace.
Copy link to clipboard
Copied
Hi,
Are you exporting a document to Microsoft Word with Adobe Acrobat Pro DC?
What operaring system are you on?
Copy link to clipboard
Copied
Hi ,
Yes. Adobe Actobat pro to Ms word 2016.
OS . Windows 10 pro