Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
2

Copy text in pdf gives me gibberish. Is there a way to OCR to correct?

Enthusiast ,
Sep 25, 2011 Sep 25, 2011

I have a few documents that are complete gibberish when I select text and copy. If I open them in Acrobat Pro, select text, Copy, and "Show Clipboard" in Finder, I see a bunch of "skull characters", and if I open them in Preview and do the same, I see strings of dots. The text in the clipboard cannot be pasted intelligibly into any other program, and I cannot search the document.

Some of these documents were downloaded from commercial sites. One of them came from (I think) OCR'ing a scanned document using ClearScan. An example of such a document is at https://public.me.com/ix/alanterra/Reynoso%202006%20p%201.pdf?disposition=download+1317001233647 (it's small, 55K).

It seems to me that one way to do this would be to convert the document into a "scanned" pdf, and then OCR it. But the only way I can figure out how to do this is to image each page separately in Photoshop, and then assemble the pages into a new document.

There must be a way to deal with this problem.

Any thoughts?

A

PS--If you look at the document linked to above, you will note that the text in the footer is coherent, but not the text in the body of the document.

TOPICS
Edit and convert PDFs
100.0K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
1 ACCEPTED SOLUTION
Guide ,
Sep 26, 2011 Sep 26, 2011

You will be able to use OCR in Acrobat after you convert the type to outlines. You will need to add some transparency, then use the flattener preview to outline your type. Here are the steps (for Acrobat 9):

1. Document> Watermark> Add (add a text watermark, hit the space bar once).

2. Advanced> Print Production> Flattener Preview> Convert all text to outlines (checkbox on). Save.

3. Document> OCR text recognition> recognize text using OCR. Select all text with the type tool, copy.

This method is not perfect, you will need to check the copy for errors.

View solution in original post

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guide ,
Sep 26, 2011 Sep 26, 2011

You will be able to use OCR in Acrobat after you convert the type to outlines. You will need to add some transparency, then use the flattener preview to outline your type. Here are the steps (for Acrobat 9):

1. Document> Watermark> Add (add a text watermark, hit the space bar once).

2. Advanced> Print Production> Flattener Preview> Convert all text to outlines (checkbox on). Save.

3. Document> OCR text recognition> recognize text using OCR. Select all text with the type tool, copy.

This method is not perfect, you will need to check the copy for errors.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Sep 26, 2011 Sep 26, 2011

Luke,

You are a genius

Thanks!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 01, 2020 Oct 01, 2020

This worked for me and saved me tons of time! Thank you very much, Luke!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 11, 2021 Jan 11, 2021

This is great, thank you! Even in 2021 it was the most helpful google result. It's not perfect indeed but it saves a lot of typing time, thanks!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 30, 2023 Aug 30, 2023

how did you find this out I have been trying for 7 hour straight embeding fonts and stuff on a 700 page file and you hit with 4 random steps that works like magic I litrally did your method as a joke expecting that it wont work. You need a medal  

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Dec 01, 2024 Dec 01, 2024
LATEST

These steps worked using Acrobat 9 Pro. I can now copy the text from the PDF and paste it into notepad and other word procesing applications. The pasted text appears correctly (i.e., no more boxes or other strange characters). Thank You!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Mar 24, 2020 Mar 24, 2020

Try selecting the text, right-clicking it, and choosing "copy with formatting". This works for me.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Mar 24, 2020 Mar 24, 2020

Thanks for the solution Mel. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Sep 25, 2020 Sep 25, 2020

Right-clicking only gives me the options: "Copy", "Highlight Text", "Add Note to Replace Text" and "Add Note to Text", the "copy with formatting" option doesn't appear for me

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Sep 25, 2020 Sep 25, 2020

There are some suggestions about enabling "Copy with formatting" here:

https://community.adobe.com/t5/acrobat/quot-copy-with-formatting-quot-option-doesn-t-appear/td-p/970...

Hopefully that will work for you.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 11, 2022 Jun 11, 2022

Thanks! This was helpful.

However, you can't even search the text that you copied from the document.

I think this is because the Acrobat search string is plain text (the gibberish search characters). For example, I used "Copy with formatting" to copy a text string, then searched using that text string (quotation marks properly removed), and got no results. 

 

FWIW, the document language is Japanese.

The unformatted text string is: éš¨ï½¬6谺。謾ケ險ら沿

Copy with formatting: ç¬¬6次改訂版

Result of search: "Adobe Acrobat has finished searching the document. No matches were found."

Result of result: /facepalm

 

Don't suppose there's a genius out there with a workaround?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 02, 2020 Oct 02, 2020

Hi All,

 

After trying a bunch of different things that didn't work on the free version of Adobe Reader I ended up installing Foxit Reader (free version) and it worked out immediately.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines