• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Changing OCR recognized text without changing the image itself.

Community Beginner ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

Hi,

 

I want to scan a legal document (e.g. contract)  and   want (better to say I must) to the keep original scanned image for ever:

 

I scan the document with  Adobe Acrobat Pro DC: 

File / Create / PDF from Scanner /  With "Recognize Text (OCR)" + Ouput =" Searcheable image". 

 

Because the printed document  has not the highest Quality,  some words were not correct interpreted. Remark: The function "Correct recognized Text"  didn't find all the incorrect interpreted text.

 

Question: is it possible to correct the recognized text without changing the image itself ?

 

Thanks in advance,

 

Michel

TOPICS
Scan documents and OCR

Views

2.1K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

Need more information:

 

What is your OS (and what release)?

What version of Acorbat Pro (and what release)?

 

Those questions presented, let me also ask:

 

What kind of scanner are you using and what is the scanner's software?

 

Here's a tip: yes you did scan through Acrobat but Acrobat didn't scan a thing. Acrobat doesn't have any scanning capability whatsoever. Rather, Acrobat relies on either TWAIN on the PC, or a direct link to Apple's "Image Capture" to do the actual scanning. So if there are issues with the scan, it's not really Acrobat's fault.

 

Now, what Acrobat does with the scan, that could be Acrobat's fault. The reason I say "could" because if you set things up to not do a good quality scan, that's your fault. The auto functions in scanners has gotten much better over the years but only to a point. Setting the white point on the paper, setting a proper ppi for an OCR scan, etc. can significantly affect the actual result of the OCR process.

 

And lastly, can you please provide a screenshot of the images you are talking about so I/we can see what the issue is?

 

Thanks!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

Hi Gary,

 

thanks  a lot for your fast response, especially on the weekend !

 

My question was (at least it was intended)  focused on the  possibility changing the recognized text at all (without channing the image).

 

I have actually no issues with scanning,  I simply want to know,  if it possible at all.

 

I wrote a use case below, not to be too theoretical. 

 

It remains only the question:  Is it  possible the change the recognized text without touching the scanned image, or not. That is the question.

 

Thanks in advance !

 

Warm regards,

 

Michel

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

Hi Michael,

 

Just so you know, I, and just about everyone else here do not work for Adobe, we're just "good folks" who try to help others.

 

I still need to see a screenshot of what you're talking about, sorry.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

HI,

 

I'm afraid, I should close the quation, because I have not any issue  right now. 

 

Cheers,

 

Michel

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

Your call, it's just that it's hard to know how to help if I can't actually understand the dynamics of the issue. 

 

OCR, by itself, does not and should not affect an image of something/anything. 

 

One of ways that Acrobat's approach to OCR is to literally remove the text image and overlay the actual OCR results over where the text was. So if the image you are talking about is the text, than you have to turn that feature off.

 

I do not know if that was of any help or not, but either way, I need to see what you are talking about or if you could proivde a more indepth description of the relationship between the image and the text involved.

 

Sorry

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

I got an invoice as an OCR scanned PDF. 

 

The invoice  has two part / two layers:

 

  1. The  image of the document and
  2. the text extracted

Use case: an important part, e.g. the total price was not  recognized correctly.

 

I want the correct the text (e.g. for  bookkeping) - but by law I may not change the image. Therefore I asked, if it is possible at all to change the text itself without chanching the image.

 

I hope it helps.

 

Tahnks,

 

Michel

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

Hi Michael,

 

Heck, why didn't you say that in the first place!! 😄

 

OK, now I understand what you're trying to do and why a screenshot would have been "an issue."

 

I have to play around with this for a bit but one more question: is the quality of the text for the total price good, middle, or poor? In other words, is gettng a better quality scan here beating a dead horse or something that can be improved?

 

While I'm mulling and playing with this, please check out a blog I wrote for Adobe some time back.

 

http://photosbycoyne.com/Gary's_Help/Scanning/clean-scanning.html

 

 

 

One question

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

Oh, one more question: is the font and/or the characteristics of the font for the Total Price (and any other listed price) different than the rest of the document?

 

I'm thinking of a grocery receipt as an example

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 11, 2021 Jul 11, 2021

Copy link to clipboard

Copied

OK, got it! Easier than I was expecting.

 

First, go into tool Scan and OCR Tool and across the top you'll see:

2021-07-11_15-04-03.png

 Click on Settings

 

From there, select Searchable Image from the dropdown menu

2021-07-11_15-04-27.png

 

Now, as you go through correcting text, NONE of the changes will show up in the text. The text will appear as scanned but the fixes will not show up.

 

The one drag on this is that Acrobat will now stop at EVERY questionable word, there does not appear to be any way to accept things for total, just be patient.

 

Hope that works for you,

 

Gary

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Jul 13, 2021 Jul 13, 2021

Copy link to clipboard

Copied

LATEST

Hi Gary,

 

thanks for your engagement, I learned a lot from your mentioned blog !

Actually I can not answer your question  : "Oh, one more question: is the font and/or the characteristics of the font for the Total Price (and any other listed price) different than the rest of the document?" - because I have no issue at the moment. 

But thanks your explanations my quation was answered:

 

Question: is it possible to correct the recognized text without changing the image itself ?

Answer: Yes it is possible, if the document was scanned with option OCR + Searcheable image. The function "Correct Recognized Text" shows all the words, where Adobe Acrobat Pro DC was not sure, if the recognized text was correct. You can correct it, without changing the image. You can switch between the image and text with the checkbox "Review recognized text". The only issue is, that I have to rely on Adobe Acrobat Pro DC, I can only those texts correct, they were by Adobe Acrobat Pro DC identified.

 

Gary, thanks again for your engegement !

 

Warm regards,

 

Michel

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines