Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Not recognizing a certain file's text

New Here ,
Jan 02, 2018 Jan 02, 2018

A certain document will not let me recognize text.

When I try, under enhanced scans- it says "the file is too wide to recognize text. Try cropping..."

When under Enhanced scans, I click the drop down, then scanned document and click recognize text and then Enhance, it says "some of the text can not be recognized" and none of the text is recognized.

This is the only file that I've run into this situation.

Any feedback is appreciated.

TOPICS
Scan documents and OCR
14.1K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 03, 2018 Jan 03, 2018

Rare opportunity I can answer my own question. For record:

I printed the file to adobe (a digital copy). Then used enhanced scan on that file and all the text was recognized.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jan 09, 2018 Jan 09, 2018

I cannot get Adobe Acrobat Pro DC 2018 to recognise any text in a high resolution PDF file. I have wasted hours on this and just can't understand it.

I am not sure what 'printed the file to adobe (a digital copy)' means or how to do it - what menus and commands please?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 09, 2018 Jan 09, 2018

Let's take a step back: What do you want to do? Is this an image based file and you are trying to OCR, or are you trying to select text? If you an give us a bit more information, we might be better prepared to provide an answer.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Participant ,
Jan 09, 2018 Jan 09, 2018

It is an image based file with text burnt into the image. It's a PDF document - six pages that were sent to me as an e-mail attachment. Some is at 300 DPI as far as I can see.I think it was put together originally in Photoshop CC with picture and text elements then exported (somehow) to become the PDF document.

It contains about half text and half picture files, the text is in white on a dark background - clear as ever to a human reader. It seems the text is now fused ("burnt in" older film people might say) with the picture - it is part of the picture. But that surely is the whole point about OCR, if it wasn't an image it would be easy to separate text without OCR in any case? .

I am using Adobe Acrobat 2018  - latest version from my CC subscription. From the tools panel: Edit, Enhance, recognise text and any other way I can think of. All show a process for about 30seconds with a blue bars and then...nothing... all the pages just seem to have one image box about A4 size. On another doc sent by the same company some of the images separate out into their boxes after the same process, though not the text.

(Weirdly when some photo boxes are deleted they leave some of the image behind, but that's another issue I think - and it doesn't print full page whatever I do , leaving white edges even when preview shows it is 100% to edge. It's as if Adobe haven't heard of A4 sized paper...). 

So all in all the OCR isn't 'easy' as advertised on the Adobe help page. Thanks though!

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 10, 2018 Jan 10, 2018

Text on an image is very challenging, and may be beyond the limits of what the OCR in Acrobat can accomplish. Keep in mind that Acrobat is not a dedicated OCR application, it does a ton of things much better than any other application. OCR is unfortunately not one of these things. I have a license to Abbyy's FineReader for more challenging OCR jobs. Anything that works well in Acrobat, I do right within Acrobat (e.g. your standard text on white paper jobs), for things that require a bit more (two or more languages per document, strange fonts, text on images) I will use FineReader.

However, if you are not getting any text from your OCR, it's possible that your text is actually not part of the image, but vector graphic - or text that was converted to outlines. In that case, you may get better results when you save your document as a high resolution image first (File>Export To>Image>TIFF - then select at least 600dpi). Now import that image again into Acrobat (File>Create>PDF From File). These two steps will have flattened everything in your PDF file into one image. Now try to run OCR again. Do you get different results?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 10, 2018 Jan 10, 2018

Hmm... I'm not sure if the solution I had works for your scenario.

File>Print- Under printer drop down, I selected 'Adobe PDF' and saved the file to my desk top. After that, on the new file run the 'enhanced scan' feature.

Hope this helps.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 16, 2018 Jan 16, 2018
LATEST

Can you please try Enhance scan> Recognize Text> In this file> (Searchable Image Exact as OCR format from settings)> Recognize Text.

Hope it works. If don't, can you please share the file where you are facing this issue.

Thanks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines