Scripted OCR doesn't let me script finding text, manual OCR does

New Here ,
Feb 06, 2017 Feb 06, 2017

Copy link to clipboard

Copied

When I script the OCRing of an image PDF, it creates bounded boxes and can't find text unless the cursor is in that particular bounded box.

However, if I manually (Enhance Scans > Recognize Text > In this file > Settings > Output = Editable Text and Images, OK) OCR the file, the findtext command works.

Document is already open when I run this VBA script:

Set aApp = CreateObject("AcroExch.App")
Set aAVDoc = aApp.GetActiveDoc()
Set aPageView = aAVDoc.GetAVPageView()
Set aPdDoc = aAVDoc.GetPDDoc() pageCount = aPdDoc.GetNumPages 

' Get PDF OCR'd
For curPage = 0 To pageCount - 1
     aPageView.GoTo curPage
     aApp.MenuItemExecute ("TouchUp:EditDocument")
Next curPage 

rtgFound = aAVDoc.FindText("accordingly", 0, 0, 1)

rtgFound is False. If I manually OCR the document and run this code:

Set aApp = CreateObject("AcroExch.App") 
Set aAVDoc = aApp.GetActiveDoc()
Set aPageView = aAVDoc.GetAVPageView()
Set aPdDoc = aAVDoc.GetPDDoc()

pageCount = aPdDoc.GetNumPages 
rtgFound = aAVDoc.FindText("accordingly", 0, 0, 1)

rtgFound is True. Is it possible to automate Acrobat to OCR into "Editable Text and Images"? That is currently the default UI setting, but it doesn't seem to make a difference.

If I have to search every one of the hundreds of little boxes, what would I have to loop through? Are there other options?

Many thanks!

TOPICS
Acrobat SDK and JavaScript, Windows

Views

361

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Feb 06, 2017 Feb 06, 2017

Copy link to clipboard

Copied

As far as I know, there is no documented (and therefore supported) method to run OCR via the IAC interface. What you are trying to do is relying on a side effect of what you are executing to get the desired result. Chances are that this was never designed to work the way you are hoping it would.

There should not be any difference between running OCR manually and via trying to edit text on a page - at least as long as you are not trying to automate this last step. What is probably happening is that Acrobat has some information cached in the AVDoc that does not get updated when you trigger OCR via the menu item. I would try is to save the document, open it again, and then see if the FindText function works.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 10, 2017 Mar 10, 2017

Copy link to clipboard

Copied

Unhappily saving and re-opening did not do the trick. I inserted this section before the FindText line:

  curDocName = aPdDoc.GetFileName

  aPdDoc.Save PDSaveFull, FilePath & curDocName

  aAVDoc.Close True

  aAVDoc.Open FilePath & curDocName, ""

  Set aAVDoc = aApp.GetActiveDoc()

A manual save and re-open did not work either.

It would be really nice to have a supported method to automate OCR.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Apr 25, 2021 Apr 25, 2021

Copy link to clipboard

Copied

Hi Karl, I'm new to the support community so I hope I'm using the appropriate route to ask this related question:

 

Is there any way to have Acrobat automatically run OCR before saving the pdf? Is there a setting in the main program or is there any method available using the SDK via VBA or Python. It seems odd that the 'Sentinel' Software package for managing text files would not have a means of automating the process of OCR. 

 

Scott

 

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Apr 26, 2021 Apr 26, 2021

Copy link to clipboard

Copied

LATEST

There does not seem to be any programming interface to OCR in Acrobat. I think this is specifically to stop attempts to use it for the sort of volume work it would be very bad at.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines