Copy link to clipboard
Copied
I have Acrobat Standard and Outlook 2013. I want to write a sub that opens up an attachment, invoke Adobe's OCR, and runs a regex test for a string pattern. I have this working to a degree, if the PDF is already OCR'd, I just open the attachment with Word and run the regex test against Word's .Content property which searches the entire document. If it's NOT already OCR'd, I tell the sub to save the PDF as a Word doc and then perform the same action but that has less than desirable results. Is there any way to invoke OCR without converting to another format and then run the regex test against the entire document? I can't even use sendkeys because there's no way to use the keyboard to get to the Recognize Text menu -__-
Here's a sample of my code
For i = myItems.Count To 1 Step -1
Set myItem = myItems.Item(i)
For Each atmt In myItem.Attachments
If Right(atmt.FileName, 3) = "pdf" Or Right(atmt.FileName, 3) = "PDF" Then
tempfilename = "C:\Users\me\Desktop\temp\" & _
Format(myItemReceivedTime, "yyyymmdd_hhnnss_") & myItemSenderName & "_" & atmt.FileName
atmt.SaveAsFile tempfilename
PDFPath = tempfilename
Set wordapp = CreateObject("word.Application")
wordapp.Documents.Open FileName:=tempfilepath & tempfilename
wordapp.Visible = True
If myRegex.Test(wordapp.ActiveDocument.Content) = True Then
myItem.Move destFolder
wordapp.ActiveDocument.Close Savechanges:=wdDoNotSaveChanges
wordapp.Quit Savechanges:=wdDoNotSaveChanges
Kill tempfilename
Else
wordapp.ActiveDocument.Close Savechanges:=wdDoNotSaveChanges
wordapp.Quit Savechanges:=wdDoNotSaveChanges
Kill tempfilename
End If
End If
Next atmt
Next I
For i = myItems.Count To 1 Step -1
Set myItem = myItems.Item(i)
For Each atmt In myItem.Attachments
If Right(atmt.FileName, 3) = "pdf" Then
tempfilename = "C:\Users\me\Desktop\temp\" & _
Format(myItemReceivedTime, "yyyymmdd_hhnnss_") & myItemSenderName & "_" & atmt.FileName
atmt.SaveAsFile tempfilename
Set objAcroApp = CreateObject("AcroExch.App")
Set objAcroAVDoc = CreateObject("AcroExch.AVDoc")
boResult = objAcroAVDoc.Open(tempfilename, "")
Set objAcroPDDoc = objAcroAVDoc.GetPDDoc
Set jsObj = objAcroPDDoc.GetJSObject
NewFilePath = WorksheetFunction.Substitute(tempfilename, ".pdf", ".docx")
jsObj.SaveAs NewFilePath, "com.adobe.acrobat.docx"
Set wordapp = CreateObject("word.Application")
wordapp.Documents.Open FileName:=NewFilePath
wordapp.Visible = True
If myRegex.Test(wordapp.ActiveDocument.Content) = True Then
myItem.Move destFolder
wordapp.ActiveDocument.Close Savechanges:=wdDoNotSaveChanges
wordapp.Quit Savechanges:=wdDoNotSaveChanges
Kill NewFilePath
Else
wordapp.ActiveDocument.Close Savechanges:=wdDoNotSaveChanges
wordapp.Quit Savechanges:=wdDoNotSaveChanges
boResult = objAcroAVDoc.Close(True)
boResult = objAcroApp.Exit
End If
End If
Next atmt
Next i
Have something to add?
Find more inspiration, events, and resources on the new Adobe Community
Explore Now