• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Unicode Text Not Encoding Properly

New Here ,
Sep 20, 2019 Sep 20, 2019

Copy link to clipboard

Copied

Hello,

 

When I convert a Microsoft Word document to PDF (using either the Adobe Printer or Save As > PDF), the Unicode text in the document does not encode properly. The text _appears_ correct, but when I copy and paste it to another program, such as Notepad, there are errors:

 

For example, the text in Microsoft Word is


बूढ़े पिता ने ऐसी भूमिका ...

The text _appears_ correct in the PDF file, but when I copy and paste it Notepad, it is:

बूढ़े 􀉟पता ने ऐसी भू􀉠मका ...

Note the boxes.

 

This erroneous encoding means it is not possible to search the PDF document properly. For example, if I search for "पिता" (the second word), I will not get a match. I would have to search "􀉟पता".

 

Embedding the fonts makes no difference.

 

I have attached a Word and PDF file for reference.

 

Please advise how I can create a PDF properly, so that the encoding is exactly like it is in any other program, i.e. without any of these boxes or other issues.

 

Thank you

 

Sim

TOPICS
Edit and convert PDFs , General troubleshooting

Views

26.4K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
1 ACCEPTED SOLUTION
New Here ,
Feb 11, 2023 Feb 11, 2023

Copy link to clipboard

Copied

I had something that sounds similar. I had English in the PDF but jibberish when i copied and pasted. So I exported the pages as images (to JPEG) files. Each page unfortunately became a separate file. Then i pulled them all back in using the Combine Files button. They came back in and the OCR automatically read them. When i copied and pasted all was fine. I think the author of the original file used a custom font and coding that was unknown to the program i was pasting to.  Not sure how good the OCR is in Asian languages though. 

View solution in original post

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Sep 22, 2019 Sep 22, 2019

Copy link to clipboard

Copied

Got it. I was also able to use the Adobe Devanagari and a few other open type/ unicode fonts. Everything is working on my end. The problem that madhuris is experiencing seems to be related in missing a step. For me to be able to work with Acrobat search I have to do the following: 1) In MS Word save as Plain Text, and in the encoding type use UTF-8. You have to select another font type like Nirmala UI or Devenagari. before you save as plain text. Then Convert to PDF the plain text file, not the original MS Word which (which has other merged formatting sources). When the text file is finally opened in Adobe Acrobat the font type may need to be changed again if it is missing characters (like an accent, for example) you have to open "Edit Text & Images" select the text that needs to be edited and choose "Nirmala UI" as the font type OR Adobe Devenagari; whatever works really , you will get a tool tip that norifies is the Font set is supporting it or no); You have to pay close attention to using any other type of Open Type/Unicode fonts because the accents are little small details and from all the fonts I've tried in Acroba and MS Word; Nirmala UI and Adobe Devenagari in Acrobat (but mostly Nirmala) seem torecognize all these characters from the original document (only when you follow the Save As steps described earlier) then use the "COPY" or "COPY WIT FORMATTING" to conduct a search or advanced search in Adobe Acrobat

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 31, 2020 May 31, 2020

Copy link to clipboard

Copied

Hey Madhuris,

 

Just a quick follow up and checking if you were able to resolve your issue.

 

Thank you.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 11, 2023 Feb 11, 2023

Copy link to clipboard

Copied

I had something that sounds similar. I had English in the PDF but jibberish when i copied and pasted. So I exported the pages as images (to JPEG) files. Each page unfortunately became a separate file. Then i pulled them all back in using the Combine Files button. They came back in and the OCR automatically read them. When i copied and pasted all was fine. I think the author of the original file used a custom font and coding that was unknown to the program i was pasting to.  Not sure how good the OCR is in Asian languages though. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Feb 11, 2023 Feb 11, 2023

Copy link to clipboard

Copied

Thank you for updating this old thread with that solution.

 

I learned something new today.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 07, 2023 Mar 07, 2023

Copy link to clipboard

Copied

Hi ,

I am trying to convert the Marathi pdf document to the word document.

It changes the font to incorrect characters as below:

किव कु लगु^ कािलदास सं^ िव^िव^ालय, रामटेक, नागपूर

Could you please help me to know what needs to be done to keep the correct format? Thanks in advace.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 07, 2023 Mar 07, 2023

Copy link to clipboard

Copied

Hi,

 

Are you exporting a document to Microsoft Word with Adobe Acrobat Pro DC?

 

What operaring system are you on?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 07, 2023 Mar 07, 2023

Copy link to clipboard

Copied

LATEST

Hi ,

 

Yes. Adobe Actobat pro to Ms word 2016.

 

OS . Windows 10 pro

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines