Searching multiple PDF files for multiple keywords using Array/Dictionary VBA

New Here ,
Mar 16, 2021 Mar 16, 2021

Copy link to clipboard

Copied

Good Day Adobe Community,

 

I have piece of code courtesy of Christos below which I reaarranged into a Function in VBA connecting to Adobe. It works fine but slow...

 

The issue I am having is following:

I have to search through 5000+ invoices which have unrecognizable filenames(i.e. 12345.pdf and invoice number 345/FV/2021 inside etc.). Therefore only option is to search for specific words like Customer Tax Id, name of the month on the invoice etc. (I have them in separate file).

Unfortunately, searching 5000 files for 1 keyword and then another and yet another using such function as below is quite hectic and also would take me couple of days and not couple hours :-((

5000 x 5000 searches is 25 000 000.searches

 

I have Adobe 9 Pro and DLL Acrobat 10 and of course Excel VBA

 

I have read that you can read/import a list  of keywords into a array/dictionary and then instead of 5000 times opening the same file for 5000 keywords one by one, you can search the file for 5000 keywords "at once".

 

In VBA and Excel I found examples how to do it. However I do not know how to adapt the code below to perform similar Array/Dictionary  search inside PDF files.

Can you point me to the right direction or post rewritten function or sub to perform such task ?

 

 

[Code]

Option Explicit

Function Find1WordinPDF(PDF_Path As String, Word_To_Find As String) As String

'----------------------------------------------------------------------------------------
'This macro can be used to find a specific WORD in a PDF document (one word ONLY -> in
'case you search two words for example it doesn't find anything, just opens the file).
'The macro opens the PDF, finds the first appearance of the specified word and make a True value

 

, commented out is instance when Adobe scrolls to it so that it is visible and highlights it.

'The code uses late binding, so no reference to external library is required.
'However, the code works ONLY with Adobe Professional, so don't try to use it with
'Adobe Reader because you will get an "ActiveX component can't create object" error.

'Written by: Christos Samaras
'Date: 04/05/2014
'e-mail: xristos.samaras@gmail.com
'site: http://www.myengineeringworld.net
' Changed into Function Auditorius 2021
'--------------------------------------------------------------------------------------

' Speed ON

Application.ScreenUpdating = False
Application.EnableEvents = False
Application.AskToUpdateLinks = False
Application.DisplayAlerts = False
Application.Calculation = xlAutomatic
ThisWorkbook.Date1904 = False
ActiveWindow.View = xlNormalView

 

'Declaring the necessary variables.
Dim App As Object
Dim AVDoc As Object
Dim PDDoc As Object
Dim JSO As Object
Dim i As Long
Dim j As Long
Dim Word As Variant
Dim Result As Integer

 

'Example below

'Specify the text you want to search.
'Word_To_Find = "Engineering"
'Using a range:
'Word_To_Find = ThisWorkbook.Sheets("PDF Search").Range("C12").Value

'Specify the path of the sample PDF form.
'Full path example:
'PDF_Path = "C:\Users\Christos\Desktop\How Software Companies Die.pdf"
'Using workbook path:
'PDF_Path = ThisWorkbook.Path & "\" & "How Software Companies Die.pdf"
'Using a range:
'PDF_Path = ThisWorkbook.Sheets("PDF Search").Range("C14").Value

'Check if the file exists.
If Dir(PDF_Path) = "" Then
Find1WordinPDF = "I cannot find PDF file! Check file path and try again !"
Exit Function
End If

'Check if the input file is a PDF file.
If LCase(Right(PDF_Path, 3)) <> "pdf" Then
Find1WordinPDF = "Input file is not a PDF!"
Exit Function
End If

'Initialize Acrobat by creating the App object.
Set App = CreateObject("AcroExch.App")

'Check if the object was created. In case of error release the objects and exit.
If err.Number <> 0 Then
Find1WordinPDF = "I cannot create  Adobe Application object!"
Set App = Nothing
Exit Function
End If

'Create the AVDoc object.
Set AVDoc = CreateObject("AcroExch.AVDoc")

'Check if the object was created. In case of error release the objects and exit.
If err.Number <> 0 Then
Find1WordinPDF = "I cannot open  AVDoc Object"
Set AVDoc = Nothing
Set App = Nothing
Exit Function
End If


'Open the PDF file.
If AVDoc.Open(PDF_Path, "") = True Then

'Open successful, bring the PDF document to the front.
' AVDoc.BringToFront

'Set the PDDoc object.
Set PDDoc = AVDoc.GetPDDoc

'Set the JS Object - Java Script Object.
Set JSO = PDDoc.GetJSObject

'Search for the word.
If Not JSO Is Nothing Then

'Loop through all the pages of the PDF.
For i = 0 To JSO.numPages - 1

'Loop through all the words of each page.
For j = 0 To JSO.getPageNumWords(i) - 1

'Get a single word.
Word = JSO.getPageNthWord(i, j)

'If the word is string...
If VarType(Word) = vbString Then

'Compare the word with the text to be found.
Result = StrComp(Word, Word_To_Find, vbTextCompare)

'If both strings are the same.
If Result = 0 Then
'Select the word and exit.
Find1WordinPDF = "I found searched keyword !"
' Call JSO.selectPageNthWord(i, j)
Exit Function
End If

End If

Next j

Next i

'Word was not found, close the PDF file without saving the changes.
AVDoc.Close True

'Close the Acrobat application.
App.Exit

'Release the objects.
Set JSO = Nothing
Set PDDoc = Nothing
Set AVDoc = Nothing
Set App = Nothing

'Inform the user.
Find1WordinPDF = "Word '" & Word_To_Find & "' was not found in PDF file!"
Exit Function
End If

Else

'Unable to open the PDF file, close the Acrobat application.
App.Exit

'Release the objects.
Set AVDoc = Nothing
Set App = Nothing

'Inform the user.
Find1WordPDF = "Cannot open PDF file!"
Exit Function
End If

' Speed OFF

Application.ScreenUpdating = True
Application.EnableEvents = True
Application.AskToUpdateLinks = True
Application.DisplayAlerts = True
Application.Calculation = xlAutomatic
ThisWorkbook.Date1904 = False
ActiveWindow.View = xlNormalView


End Function

[/Code]

TOPICS
Edit and convert PDFs, Scan documents and OCR

Views

96

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Mar 16, 2021 Mar 16, 2021

Copy link to clipboard

Copied

Yes, you'd change this one line

Result = StrComp(Word, Word_To_Find, vbTextCompare)

Into a loop that tests against all candidate strings.

You may to do some VB training if you don't know how to do that.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Mar 16, 2021 Mar 16, 2021

Copy link to clipboard

Copied

LATEST

Searching for words with JavaScript has gotten much slower since Acrobat 10. So it's slow already. running this code from VBA  just makes it worse.

I'd suggest putting all of your search code into a folder level JavaScript function, i.e. the looping through pages and words. Test the performance of this function from the Acrobat Console window. If it performs within reason there, then call the function from VBA.  I'm sure it will work better this way, but maybe not a lot better. 

 

 

 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines