Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

OCR Question

Guest
Aug 24, 2016 Aug 24, 2016

Can anyone explain why the OCR would capture the word "Park," and the comma, but ignore every other word on the page?  I find this all the more perplexing in that it read the barely legible reception number on the next page(didn't read it right, but I can still go edit it to be correct) but failed to capture any other words.  Sometimes this is very frustrating in that the word I am searching for may literally be right next to a word that got OCR no problem.  Forgive me if this topic has already been tackled.  It just seems that if resolution or dpi were the issue, I wouldn't get much of anything.  Is there a way to manually add OCR where Adobe misses it?  Also, why would it get one "Park," but not all of them.  They are practically identical.  Help!

Captureocrmiss.JPG

TOPICS
Acrobat SDK and JavaScript , Windows
654
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Aug 24, 2016 Aug 24, 2016

Hi Jesse,

Here Acrobat must recognize other words as well. What you are seeing in red boxes, are the words where Acrobat is not sure it recognize the word correctly or not. So this word is marked as suspect. Now you can manually correct this word from

1. Go to Enhance Scans> Recognize Text > "Correct recognize Text" after running OCR

2. Select any red box, now this word will be available in toolbar. Original "Image" and "recognized as"

3. You can enter correct text for this image if Acrobat recognize it incorrectly and then Accept.

Also you can see what all recognized by Acrobat. For this select checkbox "Review Recognized text".Suspect1.png

Here is a sample image where "be offered" is recognized as "b~offered". We can manually change it and accept.

"Review Recognized text" option is available at top Left corner of toolbar to see all recognized text.

Hope it will resolve your issue. Please let us know if you still face any problem.

Thanks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 25, 2016 Aug 25, 2016

Hi,

That doesn't really answer my question.  In my job, some days I just correct the words that were not recognized correctly by adobe.  That tends to be the majority due to the age of the documents I am working with.  So I have literally done exactly what you have given me instructions for, tens of thousands of times.  Maybe hundreds of thousands, I don't keep track.  What I would like to be able to do is at the very least, add recognition to words on the page that aren't even red-boxed as suspects.  Words that were missed or ignored entirely.  Can anyone help with that.  What I really want to know is why it doesn't red-box every word on the page.  Why does it get one "park" but not the other six?  Most importantly, can I add words that were missed or is this as good as it gets?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Aug 25, 2016 Aug 25, 2016

Sometimes it's not able to recognize text for some specific fonts, dark background, very large text or low DPI image.

Can you please share a sample document where you are facing this issue.

You can use https://cloud.acrobat.com/send for sharing the file.

Thanks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 25, 2016 Aug 25, 2016

I shared in my original post.  Though it is only a portion of the document due to the private confidential information in the rest of the document.  I cannot share confidential information in this forum.  Also, this isn't just one document, I have scanned close to 60,000 documents and this problem manifests in all of them.  The goal is to be able to let the software search them without looking at every document.  If I want to search for a property on lot 1 block 3 of whatever subdivision, I want to be able to type "lot 1" and have all documents that contain "lot 1" be found.  Due to the fact that all these documents are of the same quality and scanned on the same machine, it makes no sense that it would capture only 5%-10% of the "lot 1" documents.  I tried scanning them at higher resolution and surprisingly had the same or more often, worse results.  I could find some documents that are shareable, scan them at various resolution and process the same optimization and OCR and show you that it makes no difference and that there must be some other issue at play but that would be a huge waste of my time considering that I have already experimented with this.  I read every available forum and troubleshooting blog to try to find the answer.  Only after exhausting all possibilities did I resort to actually registering and asking the specific question.  If the answer is simply that Adobe isn't all that good at this task, fine, I can believe that, planned obsolescence and all.  Maybe the question I should be asking is at what point will they release a version that actually works?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 25, 2016 Aug 25, 2016

What I would like to be able to do is at the very least, add recognition to words on the page that aren't even red-boxed as suspects.  Words that were missed or ignored entirely.  Can anyone help with that.  What I really want to know is why it doesn't red-box every word on the page.  Why does it get one "park" but not the other six?  Most importantly, can I add words that were missed or is this as good as it gets?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Guest
Aug 26, 2016 Aug 26, 2016

I just realized why my question cannot and will not be answered.  It's a computer isn't it.  All the answers are regurgitated from a database based on keywords in the question.  Tell me I'm wrong.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Aug 26, 2016 Aug 26, 2016

No this is not an auto generated messages. We are already looking into this issue. But to reproduce and resolve the issue we need a sample document.

It would be great if you can share only a single page document(without OCR) where you are facing this issue. You can use https://cloud.acrobat.com/send for sharing the file and make this private or directly send this document to lgarg@adobe.com

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Oct 03, 2016 Oct 03, 2016

Please share the document(without performing) where you are facing this issue. It will really help us resolve this issue.

Thanks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Oct 13, 2016 Oct 13, 2016
LATEST

For the screen shot you shared we are able to recognize the text correctly.

Park, is red boxed only once because Acrobat was not sure if it recognize this word correctly or not. For others it didn't mark it suspect because it recognize the text correctly.

You can check what all text is recognized by Acrobat after selecting checkbox "Review Recognized text".

If any word is recognized incorrectly, just double click on that word and it will be marked as suspect now.

If you still face any issue, please share the file (before running OCR).

Thanks.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines