Participating Frequently

Question

Acrobat doesn't ocr text - leaves them as images

Forum|Forum|10 years ago
November 26, 2015
11 replies
67274 views

May I have some help please?

I use OCR to allow me to highlight and mark up text of scanned or prepared PDFs. Mostly these are unencrypted academic documents. I find that Acrobat Pro DC often does not OCR text when I click Edit. Instead it leaves the scanned blocks as images denoted in the upper left hand corner by the landscape icon and pop up text that says "This is an image". Is DC capable of OCRing these images? If so, how do I force the application to OCR?

Thanks very much

R

Robert38026074ualg

Participant

Hi. Despite having Adobe Acrobat Pro (version 22.003.20263.0), there are documents I download / print to a PDF that the text appears to come in as an image and when I run 'Recognize Text' or 'Enhance' nothing changes, nor does the word search function work (obviously no ability to highlight either). I’ll first provide the method that worked for me to get OCR text functionality and the ability to highlight followed other methods that had limited or did not work.

*** Print as “Save as PDF” worked to get both OCR & ability to highlight / edit text ***
Using the ‘save’ button to save as type “Adobe Acrobat Document.pdf (*.pdf) had OCR text functionality but security features so no ability to highlight or edit text.
Using the ‘print’ button to print as Adobe PDF results in no OCR text ability even after running 'Recognize Text' or 'Enhance' in the Scan & OCR tool (the text appears to be blocky images).
Using the ‘print’ button to print as “Microsoft Print to PDF” saves the document as a pdf, but wouldn’t open (error opening document message). I tried this method multiple times and it never worked / opened.

I hope this helps someone else as well as the folks at Adobe to resolve it.

Amal.

Legend

Hi there

Hope you are doing well and sorry to hear that

Is this an issue with a particular PDF file or with all the PDFs? Please try with a different PDF file and check. If the file is stored on a shared network/drive please download it to your computer locally and then try again.

Please update the application to the recent version 24.02.20857. Go to Help > Check for updates and reboot the computer once.

Also try to repair the installation from the help menu (Win Only) and see if that works.

You may also try to reset the Acrobat preferences as described here (https://adobe.ly/3RQLtr3) and see if that works.

~Amal

N

Nancy25168225c3fq

Participant

Its 2022 and Adobe Pro DC has failed to fix this problem. I have 168 PDF pages of a state law and despite optimizing the doc, running text recognition, and adjusting every option I can find in these threads, it won't locate words I clearly see in the doc. And if I try to highlight a word it just does a big oval on the page. Lots of money for DC Pro for not much return. So frustrating.

A

Anna25179403cbaa

Participant

This may not be plausible for a 168 page document but, I took picture in the adobe app on my phone, converted it to word on my iphone. I opened it up in the microsoft office app not word it's self, the app was able to recognize 90% of the text. I have to make some corrections but it still gave me a little of what i needed. then i was able to convert back into a PDF that was able to be searched and edited.

aminahb68172700

Participant

Hello - I am having the an issue now with the latest update to Acrobat 2019. I used to be able to run a text recognition on encrypted/secured files, be it the pdf or a scanned jpg. With the software update I can no longer do that unfortunately, there's no option to recognize text but only "edit file" which is clearly not possible for secured files. This is very annoying as I need to do this a lot in my job. Grateful if you could please advise.

garethk67695264

Participant

I've found a workaround that seems to be working (although a bit time consuming if working on documents with multiple pages) as I am also experiencing this problem from a pdf document that I created myself from scanned documents. first 30% of the pages OCR'd OK then the rest remained as images.

The workaround is to extract the affected pages from the pdf, then open them in Photoshop, flatten the image and then save the file as a Photoshop PDF. Open the file in Acrobat and it recognises the text.

jwdooley

Participant

This worked for me today. thanks

D

darren_gozali

Participant

Same here, i got 10 pages of clean documents with gray background. only 1 text out of those 10 pages got scanned. this is horrible

A

Anonymous

Came across this post trying to fix a similar problem trying to OCR legal discovery. The only solution I've found so far is to export the PDF as a TIFF and then import the TIFF into Acrobat and run OCR. Hope this helps someone!

C

calvintobecouncil

Participant

i wanna attach a one page doc , which has a graphic, the adobe said cannot process it because of a graphic element.

Lovekesh Garg

Adobe Employee

You can share the document using Adobe send

Launch your Application
Switch to Toll Center view and Open Send & Track
Click on “Select Files to Send”
A dialog will open from where you can choose the file/s you want to share
The workflow page will appear with the file/s to be shared prepopulated
Click on Create Link
Your Local files will be uploaded to the Document Cloud and a Public Link will be generated
Share the link with us

Lovekesh Garg

Adobe Employee

Hi Paul,

Sorry for the issue you are facing. Can you please verify 1 thing:

- Open PDF and click on Edit tool

- In RHP, under 'Scanned Document' option make sure 'Revert to Image' is available

- If there is 'Convert to Text' option available, please click it. It will run OCR to recognize text.

You can also try Enhance Scan tool> Recognize Text> In this File> Settings(select Editable Text & Image option to run OCR)> Recognize Text

If your document has some non Editable content, it will give an error message.

Please share a sample file and error message(if any) using https://cloud.acrobat.com/send to help us identify and resolve the issue ASAP.

Thanks.

N

nicolasr16763978

Participant

https://files.acrobat.com/a/preview/c3d2e801-a9f2-49fc-84de-03d81e202cd0

Hi,

When exportPDF this file to Word, the OCR tool works for the first 50 pages but leaves the rest as not editable image.

I've tried the "Convert the PDF to TIFF and back, and then rerun OCR" method but doesn't work either.

Can you help me with that?

Thanks

Nicolas

Lovekesh Garg

Adobe Employee

Hi Nicolas,

Sorry for the issue you are facing. We are looking for this issue. We will update you once we identified any fix or workaround for the problem you are facing.

Meanwhile can you please confirm one thing,

- are you facing this issue only on this file or there are multiple such files.

- if there are multiple files, is there any common pattern like content type or source of files or number of pages.

Thanks.

ryanh44911919

Participant

so it's not just me awesome. maybe someone will fix it. competitor here I come.

A

AcrobatQuestionsPhx

Participant

I downloaded Acrobat Pro DC as a trial just to see if it would fix this exact issue. I either have random pages that won't capture text or Acrobat recognizes 1 character of text somewhere on the page so it WILL NOT capture the text on the rest of the page. Worse yet, I just got a set of pdfs that look fine and I thought they were text captured but when I tried to search the text nothing came up. I copied the text from the pdf - see below for comparison of what was on the screen and the text that was captured. Since Acrobat recognized the gibberish as text, I can't find a way to replace/recapture the text without reprinting. Anyone have a suggestion? I'm going to look into the AbbyyFineReader in the meantime . . .

Screen view: WHEREAS, after review of the Company's operations . . .

Text from pdf for this same phrase: , (’*’% +$-1<0990>30? 71<2 0 &75 8-6AB; 7809-<376;

Fullofentropy

Participant

I have the same issue but with adobe acrobat XI. It is very annoying when some pages only convert 50% of the text into a searchable form.

As a lead test engineer, I have to scan in any manually executed tests, I don't care so much about the hand written portion, but it is sure easier to be able to search the text of every document in a folder of documents (1000s of pages) rather than have to open and guestimate what I am looking for. Eventually I will seek to try out some competitors and make the switch if I find something better.

Show more replies

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.