Acrobat OCR into Formated Text and Images

Report · Jun 01, 2016

Hi

Another Attempt:

Without Success we tried to convert scanned Book-Pages to just Formated TEXT and Images ( if there should be some...).

It always ends up with searchable Images of the Page (a Background Picture with sort of Clearscan Text over it ).

Should it be impossible to achieve Results that other OCR-Programs are easily capable of ?

Can someone help ?

Thanks in Advance

Report · Jun 01, 2016

Sorry for the issue you are facing. You can try Editable Text & Images(Clearscan if Acrobat XI or earlier) option of OCR. This option will convert your scanned pages to Editable text and Images.

For Acrobat DC: Enhance Scan> Recognize Text> In this file> Settings(select 'Editable text and Images')> Recognize Text

For Acrobat XI or earlier: Tools> Text Recognition> In this file> Edit (select 'Clearscan')> OK

Hope it will resolve your problem.

Report · Jun 02, 2016

Hi

Thanks for Your Response.

I forgot to mention that I already used ' Editable text and Images '. (In German it's called ' Bearbeitbarer Text und Bilder ')

I always get an OCRed Version, which shows a Background-Image ( which I can select and delete , however not with the ' delete Background-Function... ) and a - I call it Text-Layer -, editable, but not pure black, but in the Document-Color.

Now, how can I get rid of that Background ?

Report · Jul 11, 2016

Can you please share a sample file where you are facing this issue. Because its not expected that if we use ' Editable text and Images ' option and it still create Searchable Image. You can use https://cloud.acrobat.com/send for the same.

Also please try 1 more thing

Open file and click Edit PDF tool. It will also run OCR with 'Editable text and Images'.

Report · Jul 12, 2016

Some background info that may (or may not) be useful.

Three OCR methods with Acrobat.

--| Searchable Image (Exact)

OCR output uses glyphs with no stroke or fill (so hidden / invisible).

Process leaves the image as-is.

Consequently the result has an image that is equivalent to the paper source.

--| Searchable Image

OCR output uses glyphs with no stroke or fill (so hidden / invisible).

Process makes some cleanup edits to the images of the characters.

Consequently the result is not equivalent to the source paper.

--| ClearScan (now "Editable text and Images")

Recognized character images are replaced. Character images not recognized are untouched.

Be well...

Report · Aug 01, 2016

Are you still facing the same issue. If yes, please share the file.

Thanks.