Highlighted

How to enable non-English text copying from PDF file?

New Here ,
Nov 09, 2020

Copy link to clipboard

Copied

I am trying to convert a MS Doc file to a PDF file. If the file contains English then, one can copy text from the converted PDF file. But if the file contains language other than English, and someone tries to copy that non-English text from the converted PDF file, and paste it to a MS doc file, then the pasted text becomes gibberish.

What is the solution?

I tried with Adobe Acrobat Pro DC to convert Doc file to PDF file. Also when I save MS Doc file, I keep tick on the `Embed fonts in the file` in `Save. 

Now, the converted file look gibberish but when I copy the text from that converted PDF file and paste it somewhere else, the pasted text is just fine, but in the file it is gibberish.

 

What can I do?

TOPICS
Edit and convert PDFs, How to

Views

116

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

How to enable non-English text copying from PDF file?

New Here ,
Nov 09, 2020

Copy link to clipboard

Copied

I am trying to convert a MS Doc file to a PDF file. If the file contains English then, one can copy text from the converted PDF file. But if the file contains language other than English, and someone tries to copy that non-English text from the converted PDF file, and paste it to a MS doc file, then the pasted text becomes gibberish.

What is the solution?

I tried with Adobe Acrobat Pro DC to convert Doc file to PDF file. Also when I save MS Doc file, I keep tick on the `Embed fonts in the file` in `Save. 

Now, the converted file look gibberish but when I copy the text from that converted PDF file and paste it somewhere else, the pasted text is just fine, but in the file it is gibberish.

 

What can I do?

TOPICS
Edit and convert PDFs, How to

Views

117

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Nov 09, 2020 0
Adobe Community Professional ,
Nov 09, 2020

Copy link to clipboard

Copied

Does this also happen when you use a different font? In order to convert back from a "glyph" (that's the drawing of the character you see on a page) to the actual character, the font needs to be associated with a "ToUnicode" mapping in the PDF file. I assume that this mapping is either missing, or corrupt. The last symptom you describe (PDF looks wrong, but copied information is correct) does point to a font problem, that's why you should try with a different font. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0
New Here ,
Nov 09, 2020

Copy link to clipboard

Copied

Can you give a step by step instruction, please? Consider the follwoing  PDF file which is created by someone else, if I try to copy the text from the file I don't get exact charecters.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0
Adobe Community Professional ,
Nov 09, 2020

Copy link to clipboard

Copied

Unfortunately there are no step by step instructions to fix such a problem. You will need to figure out what exactly is wrong. Have you tried a different font in your Word document to see what happens? 

 

In the file you linked to, I get most characters correct when I copy from PDF to Word:

 

2020-11-09_12-14-19.png

 

The characters that are not correct point again to a font problem. I don't have the font that is used in the PDF file, and Word selected a font based on the character set. All glyphs with a questionmark don't exist in the font that Word selected, and the ones that show a different glyph again have a different mapping. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0
New Here ,
Nov 09, 2020

Copy link to clipboard

Copied

Let me give you a detail account of my conversion process.

 

1. I copy the text from this web page 

2. I paste the text in s MS WORD file.

3. I see the font is Vrinda 12.5

4. Then I check fonts in PDF (see the following image), and I find same font as MS Word! Then what is causing the problem? plz help.

fonts.jpg

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0
Adobe Community Professional ,
Nov 09, 2020

Copy link to clipboard

Copied

How do you then convert to PDF? What settings are you using for the conversion? Again, have you tried to change the font in Word to something else to see if it's related to this particular font? I've suggested this twice so far, but it does not seem like you’ve tried it. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0
Adobe Community Professional ,
Nov 09, 2020

Copy link to clipboard

Copied

I just tried it myself, and see the results. I suspect that this is Word messing up the information: When I select in Acrobat to File>Create>PDF From Web Page, the resulting PDF contains the correct information, and when I select and copy and paste a paragraph into Word, the text gets reproduced without a problem. 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0
New Here ,
Nov 09, 2020

Copy link to clipboard

Copied

Ofcourse I tried several fonts, for example, this time I tried SolaimanLipi (Unicode Font), the conversion process is simple, I go to Adobe tab, click "Create PDF" see the below image plz- fonts1.jpg

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0
New Here ,
Nov 09, 2020

Copy link to clipboard

Copied

"Word messing up the information".. probably not, because if I  copy the text from PDF and paste it n serach bar, I get the same scambled text. Also I can copy paste from MS WORD file without any problem.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0
Adobe Community Professional ,
Nov 09, 2020

Copy link to clipboard

Copied

Just because it works when you paste into another application, does not mean that Word is not at fault here: You can paste into Acrobat as well - either to create an annotation, or to add PDF text to a document, and in both cases, I get the correct results. 

The problem is very likely on the way out of Word via the "save as PDF" interface. 

 

In case this turns out to be a bug in Acrobat, you can submit a bug report here: http://www.adobe.com/products/wishform.html

I am sorry that I cannot offer any workaround (besides converting from HTML directly in Acrobt, as I've mentioned before).

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 09, 2020 0