• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
1

Weird characters pasted after copying text from PDF file

Contributor ,
Oct 18, 2011 Oct 18, 2011

Copy link to clipboard

Copied

I get weird characters pasted after copying text from PDF file. For instance, a plain English sentence or word becomes something like:

VGHOGHH[WRVDSDUWLUGFLyQESUDFWRVIHQyOLFWRULYDHVXDRORGXF2EWHQ

How to fix it?

Thanks.

TOPICS
Edit and convert PDFs

Views

270.1K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

New Here , Aug 10, 2023 Aug 10, 2023

Hi,

 

Just found a solution : loaded my corrupted PDF file on my smartphone and opened it with the App "Microsoft 365 (Office)".

I taped on the blue icon with the scissors, then taped on the "Lens" icon (a big point with a small point on the top-left corner and corners on the 3 other corners).

You can then select your text and copy it in a Word document (of other).

The only limitation is that you can copy only what you see on your screen, if you have multiple pages, you have to do it multiple times,

...

Votes

Translate

Translate
Community Expert ,
Oct 18, 2011 Oct 18, 2011

Copy link to clipboard

Copied

If you go to File>Properties and go to the Fonts tab, what is the Identity listed for the fonts?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 18, 2011 Oct 18, 2011

Copy link to clipboard

Copied

It's a  "problem" that often happens accidentally, but is also used intentionally to prevent copying and indexing of PDF files, especially when posted online.

Fonts in PDF files are stored with two tables, one contains the glyphs (the character shapes) and one contains a "toUnicode" map, which says what character each glyph represents. Acrobat uses the first table to draw the page, so it doesn't actually know what the text "says", only which patterns of shapes to draw. When you copy or search the file, the second lookup table is used to work out what the text says (i.e. in the word APPLE the first table says the second shape looks like "P" even if the shapes aren't stored in alphabetical order, the toUnicode table says the second letter is 0x0050, a capital P).

If this toUnicode map is corrupted or missing, the PDF will render to screen (and print) just fine, but Acrobat has no idea what the shapes mean. The result when you screenread, export, search or copy/paste is a default set of mappings - so it will be a 1:1 relationship (every "A" will become the same character) - but the pairing is not predictable, so it cannot automatically be repaired. You can do it using plugins but would have to manually work out what each pair should be, and recreate the map table a letter at a time.

When this happens intentionally, it means the document author has removed or re-written the toUnicode map, using a plugin. When it happens accidentially it usually means the software exporting the PDF didn't pass the correct font information to the PDF print driver (in the PostScript stream).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Oct 18, 2011 Oct 18, 2011

Copy link to clipboard

Copied

Thanks. Yet, in this case I can see the words OK and search for them OK. The only problem is when copy/pasting.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 18, 2011 Oct 18, 2011

Copy link to clipboard

Copied

The reason I asked is that every embedded subset is listed as encoding: Custom which sounds like what Dave said above.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 19, 2011 Oct 19, 2011

Copy link to clipboard

Copied

As I said, the font mapping is corrupted. Your file contains 122 fonts, but not all are broken. Some text copies OK, some does not - and the text which does not copy cannot be searched. Try searching for "extractos" and it will ignore the title on page 1 (using a corrupted font) but will find it on page 9 (using an intact font).

@Larry - "custom" encoding is perfectly OK, provided the mapping table is present.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Oct 19, 2011 Oct 19, 2011

Copy link to clipboard

Copied

Thanks. Is there any application to fix such problem? I mean, to repair the file with a click or so. Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Oct 19, 2011 Oct 19, 2011

Copy link to clipboard

Copied

No. Read my post.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 09, 2011 Nov 09, 2011

Copy link to clipboard

Copied

I have a similar issue but other people can copy and paste from the files but I can't. Is this the same issue?

I have a problem which I am not sure where it sits, so I am posting here and a couple of Apple support forums as I think it might be deeper than just Reader/Pro as it also means I can't copy and paste text from Sente (reference library software).

I do a fair amount of research and download plenty of research papers.  One such paper (Diving and Hyperbaric Medicine - DHM) from the South Pacific Underwater Medicine Society (SPUM) is causing me a major problem with copying and pasting text from it; no-one else who I have spoken to has the same problem.

Quote:

                                 
The further development of medical support for professional diving
David Elliott
                                                                                                               
                                                                                                                     
                                                                                                                     
                                                                                                                    
                                                                                                                       
and when I copy and paste the same from Preview, I get

Quote:

The further development of medical support for professional diving David Elliott

I have had a look and content copy and page extraction are both allowed so security isn't an issue :(.

Acrobat Distiller 8.1.0 (Windows) was used to create the file and is a PDF Version 1.4 (Adobe 5.x)

I am on Lion 10.7.2 if that makes a difference. I have had a look at the fonts table and they are either 'Custom' (Type 1) or 'Identity-H' (Type 1 CID).

There are around 40 of these files which are produced elsewhere I can't ask for them to be reproduced but others don't have the same issues as me. Any ideas?

Thanks very much for any help you can give me

Regards

Gareth

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Nov 15, 2011 Nov 15, 2011

Copy link to clipboard

Copied

I know this was from 5 days ago, but this might be helpful.

http://forums.adobe.com/message/3938668#3938668

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Nov 04, 2019 Nov 04, 2019

Copy link to clipboard

Copied

I know this is from a long time ago, but it might help somone else. 

I've just had the same problem, Using Acrobat Pro, I exported the PDF to a Word document, I'm now able to copy the text 🙂

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Feb 17, 2020 Feb 17, 2020

Copy link to clipboard

Copied

It works. You saved my time. Many thanks!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
May 02, 2020 May 02, 2020

Copy link to clipboard

Copied

Thanks for the information. I have tried, yet it did not work for me. I guess that it depends on the PDF. Is there a solution for such a problem? Thanks.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
May 02, 2020 May 02, 2020

Copy link to clipboard

Copied

Before you export the PDF to Word, embed the fonts using Acrobat's Preflight utility.

Garbled, unreadable text, and missing characters are usually caused by fonts not having been embedded into the PDF as it was created.

 

Instructions are here:

https://community.adobe.com/t5/acrobat/scrambled-text-when-viewing-pdf-documents-in-acrobat-standard...

|    Bevi Chagnon   |  Designer & Technologist for Accessible Documents |
|    PubCom |    Classes & Books for Accessible InDesign, PDFs & MS Office |

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
May 03, 2020 May 03, 2020

Copy link to clipboard

Copied

Thanks, but still it did not work. This is the file if you want to test:

https://www.juntadeandalucia.es/boja/2020/522/BOJA20-522-00017-4760-01_00172168.pdf

 

Note: the issue does not arise in all the pages of the document, but only in the annex sections (pages 10 to 17).

 

This is what I did:

 

1. Acrobat Pro DC - Tools - Protect & Standardize - Print Production.

2. Acrobat Pro DC - Tools - Protect & Standardize - Print Production - Preflight.

3. Acrobat Pro DC - Tools - Protect & Standardize - Print Production - Preflight - Document - Embed font - Fix.

4. Save as PDF.

5. Preflight profile "Embed fonts" did not find any errors or warnings.

 

Using Adobe Acrobat Pro DC 20.006.20042 on macOS 10.12.6 (16G2136) Sierra.

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 22, 2023 Mar 22, 2023

Copy link to clipboard

Copied

There is a work around, you can flatten the pdf file and using ocr convert ke file to editable one.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 30, 2023 Mar 30, 2023

Copy link to clipboard

Copied

Thank you, this worked!!! Big time and error savings. I must save this work around for future issues.
Much appreciated. 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 12, 2022 Jun 12, 2022

Copy link to clipboard

Copied

Hello,

I have a similar problem with an e-book (the attached pdf is an exctract from it). For instance, the character "=" turns into "1/4" when copied/pasted, but there are lots of a other characters: the German umlauts, ó turning into o´, the glyphs fi and fl turning into only f, and so on. Non of the solutions proposed here did work for me. The book has over 2000 pages and I often copy text from it and search text so it would be great if I could fix this. Any suggestions?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 19, 2022 Jun 19, 2022

Copy link to clipboard

Copied

my issue was a app (highlight) , such software mess up with the file. The fast way was to delete the app and use the ios mac instead. 
I could fix the files that this app f@ck up

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jul 31, 2022 Jul 31, 2022

Copy link to clipboard

Copied

Just wanted to state that the problem I have is not solved by the current instructions either. The problem: The vector pdf, created with the application Acrobat Distiller 17.0, it says under File Properties, looks fine, including the fonts. The problem is only with copying: some fonts get missing: E.g.: "The idea" (as displayed on the pdf) becomes "e Idea" when pasted. As in another comment above: "Copy with Formatting" solves the issue. (Exporting to Word or Html, however, does not. Embedding fonts in Preflight, or "Fix potential font problems," "Embed missing fonts," or "Fix font encoding (CIDSet) --using Preflight fix ups again -- do not help either. "List potential font problems, by contrast, indeed lists "potential problems". As the name goes, it doesn't fix anything though). 

I would suppose there should be a way to fix the problem in Acrobat itself. Currently, I "copy with formatting" the whole text to paste it to another program, to finally send it to an e-reader. Not too much of a problem but Acrobat is probably supposed to fix the issue itself. 

 

Other macOS applications (Preview or Skim) display the pdf well and "copy" the  text correctly as well. 

 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Jul 22, 2023 Jul 22, 2023

Copy link to clipboard

Copied

More than 11 years later, this issue, as a simple direct fix, is apparently STILL not resolved. In Acrobat, the font is Open Sans. Editing the PDF by changing to any other font causes the weird characters.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 10, 2023 Aug 10, 2023

Copy link to clipboard

Copied

LATEST

Hi,

 

Just found a solution : loaded my corrupted PDF file on my smartphone and opened it with the App "Microsoft 365 (Office)".

I taped on the blue icon with the scissors, then taped on the "Lens" icon (a big point with a small point on the top-left corner and corners on the 3 other corners).

You can then select your text and copy it in a Word document (of other).

The only limitation is that you can copy only what you see on your screen, if you have multiple pages, you have to do it multiple times, but it works.

Maybe that works with Microsoft 365 on a laptop too, but I didn't try.

 

Hope that can help some people.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines