Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Regex Test a PDF using Outlook 2013 VBA

New Here ,
Mar 20, 2017 Mar 20, 2017

I have Acrobat Standard and Outlook 2013. I want to write a sub that opens up an attachment, invoke Adobe's OCR, and runs a regex test for a string pattern. I have this working to a degree, if the PDF is already OCR'd, I just open the attachment with Word and run the regex test against Word's .Content property which searches the entire document. If it's NOT already OCR'd, I tell the sub to save the PDF as a Word doc and then perform the same action but that has less than desirable results. Is there any way to invoke OCR without converting to another format and then run the regex test against the entire document? I can't even use sendkeys because there's no way to use the keyboard to get to the Recognize Text menu -__-

Here's a sample of my code

For i = myItems.Count To 1 Step -1
Set myItem = myItems.Item(i)
    For Each atmt In myItem.Attachments
        If Right(atmt.FileName, 3) = "pdf" Or Right(atmt.FileName, 3) = "PDF" Then
            tempfilename = "C:\Users\me\Desktop\temp\" & _
            Format(myItemReceivedTime, "yyyymmdd_hhnnss_") & myItemSenderName & "_" & atmt.FileName
            atmt.SaveAsFile tempfilename
            PDFPath = tempfilename
            Set wordapp = CreateObject("word.Application")
            wordapp.Documents.Open FileName:=tempfilepath & tempfilename
            wordapp.Visible = True
                If myRegex.Test(wordapp.ActiveDocument.Content) = True  Then
                    myItem.Move destFolder
                    wordapp.ActiveDocument.Close Savechanges:=wdDoNotSaveChanges
                    wordapp.Quit Savechanges:=wdDoNotSaveChanges
                    Kill tempfilename
                Else
                    wordapp.ActiveDocument.Close Savechanges:=wdDoNotSaveChanges
                    wordapp.Quit Savechanges:=wdDoNotSaveChanges
                    Kill tempfilename
                End If
            End If
    Next atmt
Next I

For i = myItems.Count To 1 Step -1
Set myItem = myItems.Item(i)
    For Each atmt In myItem.Attachments
        If Right(atmt.FileName, 3) = "pdf" Then
            tempfilename = "C:\Users\me\Desktop\temp\" & _
            Format(myItemReceivedTime, "yyyymmdd_hhnnss_") & myItemSenderName & "_" & atmt.FileName
            atmt.SaveAsFile tempfilename
            Set objAcroApp = CreateObject("AcroExch.App")
            Set objAcroAVDoc = CreateObject("AcroExch.AVDoc")
            boResult = objAcroAVDoc.Open(tempfilename, "")
            Set objAcroPDDoc = objAcroAVDoc.GetPDDoc
            Set jsObj = objAcroPDDoc.GetJSObject
            NewFilePath = WorksheetFunction.Substitute(tempfilename, ".pdf", ".docx")
            jsObj.SaveAs NewFilePath, "com.adobe.acrobat.docx"
            Set wordapp = CreateObject("word.Application")
            wordapp.Documents.Open FileName:=NewFilePath
            wordapp.Visible = True
                If myRegex.Test(wordapp.ActiveDocument.Content) = True Then
                    myItem.Move destFolder
                    wordapp.ActiveDocument.Close Savechanges:=wdDoNotSaveChanges
                    wordapp.Quit Savechanges:=wdDoNotSaveChanges
                    Kill NewFilePath

              Else

                    wordapp.ActiveDocument.Close Savechanges:=wdDoNotSaveChanges
                    wordapp.Quit Savechanges:=wdDoNotSaveChanges
                    boResult = objAcroAVDoc.Close(True)
                    boResult = objAcroApp.Exit

                End If
        End If
    Next atmt
Next i

TOPICS
Acrobat SDK and JavaScript
723
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
no replies

Have something to add?

Join the conversation