Skip to main content
alanterra
Inspiring
September 26, 2011
Answered

Copy text in pdf gives me gibberish. Is there a way to OCR to correct?

  • September 26, 2011
  • 3 replies
  • 100304 views

I have a few documents that are complete gibberish when I select text and copy. If I open them in Acrobat Pro, select text, Copy, and "Show Clipboard" in Finder, I see a bunch of "skull characters", and if I open them in Preview and do the same, I see strings of dots. The text in the clipboard cannot be pasted intelligibly into any other program, and I cannot search the document.

Some of these documents were downloaded from commercial sites. One of them came from (I think) OCR'ing a scanned document using ClearScan. An example of such a document is at https://public.me.com/ix/alanterra/Reynoso%202006%20p%201.pdf?disposition=download+1317001233647 (it's small, 55K).

It seems to me that one way to do this would be to convert the document into a "scanned" pdf, and then OCR it. But the only way I can figure out how to do this is to image each page separately in Photoshop, and then assemble the pages into a new document.

There must be a way to deal with this problem.

Any thoughts?

A

PS--If you look at the document linked to above, you will note that the text in the footer is coherent, but not the text in the body of the document.

This topic has been closed for replies.
Correct answer Luke Jennings

You will be able to use OCR in Acrobat after you convert the type to outlines. You will need to add some transparency, then use the flattener preview to outline your type. Here are the steps (for Acrobat 9):

1. Document> Watermark> Add (add a text watermark, hit the space bar once).

2. Advanced> Print Production> Flattener Preview> Convert all text to outlines (checkbox on). Save.

3. Document> OCR text recognition> recognize text using OCR. Select all text with the type tool, copy.

This method is not perfect, you will need to check the copy for errors.

3 replies

Participant
October 2, 2020

Hi All,

 

After trying a bunch of different things that didn't work on the free version of Adobe Reader I ended up installing Foxit Reader (free version) and it worked out immediately.

Participating Frequently
March 24, 2020

Try selecting the text, right-clicking it, and choosing "copy with formatting". This works for me.

Participating Frequently
March 24, 2020

Thanks for the solution Mel. 

Luke Jennings
Luke JenningsCorrect answer
Inspiring
September 26, 2011

You will be able to use OCR in Acrobat after you convert the type to outlines. You will need to add some transparency, then use the flattener preview to outline your type. Here are the steps (for Acrobat 9):

1. Document> Watermark> Add (add a text watermark, hit the space bar once).

2. Advanced> Print Production> Flattener Preview> Convert all text to outlines (checkbox on). Save.

3. Document> OCR text recognition> recognize text using OCR. Select all text with the type tool, copy.

This method is not perfect, you will need to check the copy for errors.

alanterra
alanterraAuthor
Inspiring
September 27, 2011

Luke,

You are a genius

Thanks!