Scripted OCR doesn't let me script finding text, manual OCR does

Question

When I script the OCRing of an image PDF, it creates bounded boxes and can't find text unless the cursor is in that particular bounded box.

However, if I manually (Enhance Scans > Recognize Text > In this file > Settings > Output = Editable Text and Images, OK) OCR the file, the findtext command works.

Document is already open when I run this VBA script:

Set aApp = CreateObject("AcroExch.App")
Set aAVDoc = aApp.GetActiveDoc() 
Set aPageView = aAVDoc.GetAVPageView() 
Set aPdDoc = aAVDoc.GetPDDoc() pageCount = aPdDoc.GetNumPages  

' Get PDF OCR'd 
For curPage = 0 To pageCount - 1 
     aPageView.GoTo curPage 
     aApp.MenuItemExecute ("TouchUp:EditDocument") 
Next curPage  

rtgFound = aAVDoc.FindText("accordingly", 0, 0, 1)

rtgFound is False. If I manually OCR the document and run this code:

Set aApp = CreateObject("AcroExch.App") 
Set aAVDoc = aApp.GetActiveDoc() 
Set aPageView = aAVDoc.GetAVPageView() 
Set aPdDoc = aAVDoc.GetPDDoc() 

pageCount = aPdDoc.GetNumPages  
rtgFound = aAVDoc.FindText("accordingly", 0, 0, 1)

rtgFound is True. Is it possible to automate Acrobat to OCR into "Editable Text and Images"? That is currently the default UI setting, but it doesn't seem to make a difference.

If I have to search every one of the hundreds of little boxes, what would I have to loop through? Are there other options?

Many thanks!

Karl Heinz Kremer · Answer

As far as I know, there is no documented (and therefore supported) method to run OCR via the IAC interface. What you are trying to do is relying on a side effect of what you are executing to get the desired result. Chances are that this was never designed to work the way you are hoping it would.

There should not be any difference between running OCR manually and via trying to edit text on a page - at least as long as you are not trying to automate this last step. What is probably happening is that Acrobat has some information cached in the AVDoc that does not get updated when you trigger OCR via the menu item. I would try is to save the document, open it again, and then see if the FindText function works.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded