Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Acrobat cannot run OCR due to renderable text on page

Enthusiast ,
Dec 13, 2024 Dec 13, 2024

Trying to run the Text recognition tool on a file and I get the above error. Saved it as a TIFF file and brought it back into AA. Was able to run Text recognition, however, really did a horrible job on the rendering of the text. I have also tried changing the DPI to 600. Only after it was on Auto run.

Acrobat 24.1

Win 11

Thanks

TOPICS
Scan documents and OCR
2.1K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 13, 2024 Dec 13, 2024

Hi, @westdr1dw; please review this blog I wrote for Adobe several years ago. See if any of the scanning issues might be present in your scan.

 

BTW, Adobe has NEVER been able to scan a page with partial text already rendered. It's a legacy "feature!"

https://community.adobe.com/t5/acrobat-discussions/acrobat-cannot-run-ocr-due-to-renderable-text-on-...
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 16, 2024 Dec 16, 2024

Hello Gary;

Appreciate the response. However, the link just returns me to the same page.

 

I have discovered a way to work around this. Having been a AA user since ver 5, we use to use Distiller when we had issues trying to convert scanned image documents. Not sure why Adobe has made this feature difficult to find. Why anyone would save it to an image file, then import it into AA seems a little backward. In any case was able to turn image to Intelligent Text using Distiller. Hope Adobe keeps this little jewel around for many years.  Thanks

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 16, 2024 Dec 16, 2024

Very sorry, I have no idea why that happened. I just tested the link below, and it worked a second ago. Very strange.

 

I hope you find this useful.

 

https://community.adobe.com/t5/adobe-community-professionals/scanning-clean-searchable-pdfs/m-p/4785...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 16, 2024 Dec 16, 2024

A very indepth article indeed. one of the jobs I had many years ago was converting volumes of HP scans to Intelligent text. I have always found Distiller to be a great friend. When you could not sucessfuly convert the file to text. Having been retired for a few years, I have not used AA to the level I used it before. In searching for this feature in the newest release of AA I discovered the documentation is very lacking. I am familiar with your methods described in the article. Especially using the Levels command in PS. However, from personal experience saving the document as a TIFF or any image format induces artifacts on the text that were not present before. One of the lessons learned in using AA Pro over 25 years, seemed they would always seem to take two steps forward and one step back in the next update. Using AA Pro 5 after converting a couple hundred forms to fillable text I had to change some of the tabs on the form. In ver 5 literally seconds. Upon releasing ver 6, they caused a real headache and hours of additional work over the previous release. I need to re-gain the familiarty with the latest app. The new GUI is horrific, not sure I can ever use to the new screens.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 16, 2024 Dec 16, 2024

Hi, @westdr1dw, saving the scanned document as a TIFF will NOT create artifacts. Period. JPG will create the fewest artifacts if you save at 100%, but beyond that, any level of compression will create artifacts. These are mostly irrelevant in photographs as they merge into the constantly changing pixels of the image. But flat sections like the blank page background (or a clear sky) are perfect for letting them show up. 

 

For background, I started seriously using Acrobat with v. 3, and have been doing OCR (and writing about it) for almost 30 years.

 

Oh, the Levels controls should be used at the scanning point. Scanning, especially with today's software and machines, is done in 16-bit. Then, the image is automatically converted to 8-bit when it's saved as either JPG, or PDF. Only some software will give you the opportunity to save the image file in 16-bit. When adjusting 8-bit files in Levels within PS, you do risk creating significant combing, which CAN cause strange artifacts to show up, even in a TIF file, but that is not JPG degradation, which sounds like what you are referring to.

 

Nonetheless, you solved your issue, and that's the most important thing. Congrats.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 16, 2024 Dec 16, 2024

Gary;

Appreciate the explanation. I believe proper term is aliasing the text. I have not heard of combing text. PS does have tools anti-aliasing  tools as well a feature to clean up JPG degradation. I have not had to use scan documents for a little while. Now that you bring it up as I recall there was a Histogram feature in some scanning. My focus was primarily on converting Image docs to Intelligent text and creating fillable forms. I really need a review to get a new perspective. Always open to learning. Thanks.

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 16, 2024 Dec 16, 2024

Hi, @westdr1dw, here's combing:

 

If you open a faded 8-bit image, notice the histogram is not touching the B&W regions.

2024-12-16_15-26-54.png

(Ignoring the color issue, let's just focus on the faded image) Let's move the left and right controls in.

 

2024-12-16_15-23-09.png

And now the image doesn't look as faded.

 

But, if the image is an 8-bit image, and you close this window and then open it again, it looks like this:

2024-12-16_15-23-42.png

That's combing.

 

To take this one step further, if you adjust each of the R, B, and G histograms, it looks like this:

2024-12-16_15-30-03.png

 Now, how much this affects OCR, I am not fully sure, but since it does affect anti-aliasing of an image, I've always been concerned (but have not tested). 

 

Now, if you do ALL of these adjustments on 16-bit image, the combing does not show up.

 

If you make all of these adjustments in the scanner's software, it will not appear even if you export an 8-bit image.

 

Note: all of these images are from a presentation I do on scanning. The reason I've not tested this is that "generally," OCR issues are not part of scanning. :>)

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 16, 2024 Dec 16, 2024

Memory fog beginng to clear 🙂 I have had to use Histograms on very faded text to bring clarity to text. Not sure if you use PS. OTH you may just be using the image in the post as an example. PS has some awesome neural filters to restore an older image. I had to do this for someone a few years ago. Results were pretty good after running the filter. As with a lot of automation, for optimal results still requires a little tweaking.

 

From a technical perspective, Recognize Text feature is not using OCR? 

Good discussion though. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 16, 2024 Dec 16, 2024

What you see above is an image being adjusted in Photoshop and the results of fixing an image in 8-bit. The neural filters can be OK, but I've had much better results (to date) by using other tools as needed. If they merge their neural filters with their AI work, they'll have some great results. 

 

What I'm trying to get across here is that the more you fix an image within the scanning software, the better the final result will be, whether it's for OCR or photos of the family from the 1900s. BTW, that photo is my Mother-in-law. So yes, I scanned that image and fixed it up. Like I said, I use this image when teaching how to scan. And yes, scanning is much better than taking photos of images, slides, or negatives. The biggest negative is time: It takes a lot more time. If one has a LOT of slides, photos, and negatives, I recommend they photograph them because not everyone is a keeper. As one reviews their images and find ones you want to improve, go back to the original and scan it properly. Then you can go into Photoshop to do repairs. I've been scanning for about 30 years, and I've tried all of the ways around this. Scanning is still the best way to get the best image to start any other improvements with.

 

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 17, 2024 Dec 17, 2024

Gary;

Thanks for the sharing your knowledge and years of experience. 

However, for clarity you mean When you say scanning is still the best method.  You are referring to printed documents. 

The art of Scanning has lost interest over the past 10 years. I do not see many flat bed scanners on the market today. The majority of people still scanning are sending documents ILO fax.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 17, 2024 Dec 17, 2024

I'm referring to all kinds of documents, whether printed, handwritten, photographed, slide, or negative. The catch is, as stated, it can be slow. Photographing documents is by far the fastest — hands down. However, photographing a document clearly has disadvantages in terms of quality enhancement. Many times, it can be "good enough." But when quality is a concern, you cannot beat scanning.

 

Yes, I am aware that scanners are going out of fashion. But that is not because of their advantages. I strongly feel it is because, as stated, they take more time and effort. It's much faster and easier to snap a photograph. I've done sessions on how to photograph slides (requiring a True White background, dedicated macro lenses, etc.), and after the session, people want to know how to do it with their cell phones. Heck, I use my cell phone all the time to take pictures. But I use my DSLR to take photos. Here, the user is losing so much when taking photos that a phone's camera will create significantly more issues to overcome.

 

It sounds like you are aware of posterization. That is caused when the gradations across (say) the sky are so subtle that you see banding on a photograph (more often than not when printed). When you scan a photo by just clicking "scan," you are very likely to create that banding. However, if you set Levels, Curves, etc. in the scanning process, you will not get them.

 

I'm sure you get the idea at this point. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 18, 2024 Dec 18, 2024

Morning Gary

A couple of interesting points you make. I have been a a frequent flyer for several years of a few Adobe apps, and have used on a very limited basis most of the others. So yes, I am familiar with posterization. At point in my career I managed 5 print shops for the govt. The overwhelming majority of users are content with cell phone pictures. The sensors have drastically improved over the years. In camera editing has taken it to a new level. However, still cannot match the features of a DSLR. Should not expect it to either. 

Been shooting photos back in the dark ages in the darkroom.

 

However, the past ~25 years I have been shooting in RAW. I thought you were attempting to convince me scanning would give a better result than the original. 

 

As a purist over the years, I was a little turned off when they added movie capability to the DSLR. However, to survive industry and users had to adjust. However, I refuse to buy into the mirrorless tech. At least for the present. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 18, 2024 Dec 18, 2024

Morning: one last example of how scanning provides better capabilities than photography (outside of speed), and this will focus (no pun intended) on photographs.

 

When I mentioned that I had photographed slides, it was because I had just under 10,000 slides that I wanted to digitize. I knew from the onset that I'd be dead before I finished scanning all of them, and, as I said, not all were keepers. So I photographed them all, and occasionally go back and scan the actual keepers. All of the images were tethered to Lightroom Classic.

 

[For my process, you can see this: 

https://community.adobe.com/t5/adobe-community-professionals/digitizing-your-slides-by-photography/t...]

 

Here is a shot I did of an old barn. (All images were "as scanned," and have no LRC adjustments.)

2024-12-18_09-05-29.png

Because my camera was a bit "tipped," I went into Transform, and it did this:

2024-12-18_09-05-36.png

 It's hard to see, but there is about a 2°–3° CW rotation.

 

I then scanned the image — noticed the colors and lighting were better. 

2024-12-18_09-05-43.png

Then, I again performed a Transform.

2024-12-18_09-05-49.png

 

I wish I could tell you why the Scanned version can be adjusted via Transform, but the Photographed version cannot. I do not know.

 

When removing grain in an image, again, the Photographed version can hardly be touched in Photoshop, but (most) scanning software already has grain removal as part of its repertoire. (Although Topaz's noise removal does a decent job at grain.)

 

One last thing: I LOVE my Canon 7Dm2, but I look at the weight of the mirrorless and scratch my chin — longingly.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 18, 2024 Dec 18, 2024

What ever works best for the user. Personally would have used the Adaptive WA filter to correct the geometric distortion. Follow up with a Curves layer. But, again whatever works for the user. I have used Topaz for years. I have an older version of SAI which still works for me. I wish Topaz would have kept them seperate. Makes the price still a little to high for me. I am also a Canon shooter having owned a few bodies. My grandson has my 7D. Great body, just wanted to go Full Frame. 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 18, 2024 Dec 18, 2024

Sure, I use that as well. Here, I was testing how things work with the application, and this function leaped out in patheticness. I had to scan the image to verify if it was the photograph or the scan. 

 

I try not to rely upon just one tool when there are so many available. Not all tools work best, and I'm sure you know that.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2024 Dec 19, 2024

Finally had a chance to look at the scanning link you sent. The little girls facial area seemed a little soft and did not pickup on some of the attributes. Could just be an anomaly. However, something that I quickly recognized is the gear; Manfrotto tripod, got the same, Canon L series lens. Not sure which one though. I have a 24-70/2.8. My go to lens.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Dec 19, 2024 Dec 19, 2024

You are correct that it's a Canon L series, but it's a 100ml Macro lens. I stated that in my equipment list.  

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2024 Dec 19, 2024
LATEST

I missed the equipment list. I have the 100 macro 1st edition.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Dec 18, 2024 Dec 18, 2024

Hi @westdr1dw 

Will it be possible to share the test fileon which the issue occurred.

 

Thanks,

Shakti K

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Enthusiast ,
Dec 19, 2024 Dec 19, 2024

Shakti K concern has been resolved using Distiller. 

Thanks

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines