Copy link to clipboard
Copied
Hello,
When I convert a Microsoft Word document to PDF (using either the Adobe Printer or Save As > PDF), the Unicode text in the document does not encode properly. The text _appears_ correct, but when I copy and paste it to another program, such as Notepad, there are errors:
For example, the text in Microsoft Word is
बूढ़े पिता ने ऐसी भूमिका ...
The text _appears_ correct in the PDF file, but when I copy and paste it Notepad, it is:
बूढ़े पता ने ऐसी भूमका ...
Note the boxes.
This erroneous encoding means it is not possible to search the PDF document properly. For example, if I search for "पिता" (the second word), I will not get a match. I would have to search "पता".
Embedding the fonts makes no difference.
I have attached a Word and PDF file for reference.
Please advise how I can create a PDF properly, so that the encoding is exactly like it is in any other program, i.e. without any of these boxes or other issues.
Thank you
Sim
I had something that sounds similar. I had English in the PDF but jibberish when i copied and pasted. So I exported the pages as images (to JPEG) files. Each page unfortunately became a separate file. Then i pulled them all back in using the Combine Files button. They came back in and the OCR automatically read them. When i copied and pasted all was fine. I think the author of the original file used a custom font and coding that was unknown to the program i was pasting to. Not sure how good the
...Copy link to clipboard
Copied
Copy link to clipboard
Copied
Hey Madhuris,
Just a quick follow up and checking if you were able to resolve your issue.
Thank you.
Copy link to clipboard
Copied
I had something that sounds similar. I had English in the PDF but jibberish when i copied and pasted. So I exported the pages as images (to JPEG) files. Each page unfortunately became a separate file. Then i pulled them all back in using the Combine Files button. They came back in and the OCR automatically read them. When i copied and pasted all was fine. I think the author of the original file used a custom font and coding that was unknown to the program i was pasting to. Not sure how good the OCR is in Asian languages though.
Copy link to clipboard
Copied
Thank you for updating this old thread with that solution.
I learned something new today.
Copy link to clipboard
Copied
Hi ,
I am trying to convert the Marathi pdf document to the word document.
It changes the font to incorrect characters as below:
किव कु लगु^ कािलदास सं^ृ त िव^िव^ालय, रामटेक, नागपूर
Could you please help me to know what needs to be done to keep the correct format? Thanks in advace.
Copy link to clipboard
Copied
Hi,
Are you exporting a document to Microsoft Word with Adobe Acrobat Pro DC?
What operaring system are you on?
Copy link to clipboard
Copied
Hi ,
Yes. Adobe Actobat pro to Ms word 2016.
OS . Windows 10 pro