It takes about 15 minutes to execute jso.getPageNumWords
Hi.
I have written an macro as the following code to get the list of word and the quads of the word in PDF file.
But the code doesn't work well.
Sub GetPDFWdList()
Dim acroApp As Object
Dim acroPDDoc As Object
Set acroApp = CreateObject("AcroExch.App")
Set acroPDDoc = CreateObject("AcroExch.PDDoc")
Call Prc_1(acroPDDoc)
Call Prc_2(acroPDDoc)
aroApp.Hide
acroApp.Exit
Set acroPDDoc = Nothing
Set acroApp = Nothing
MsgBox "Done"
End Sub
Private Sub Prc_1(acroPDDoc As Object)
Call GetWdList_EachPDF(acroPDDoc, path, 1)
End Sub
Private Sub Prc_2(acroPDDoc As Object)
Dim fileArr As Variant
Dim i As Long
fileArr = Array()
Call GetFileList(folderPath, fileArr, "pdf", False)
For i = LBound(fileArr) To UBound(fileArr)
Call GetWdList_EachPDF(acroPDDoc, CStr(fileArr(i)), i + 1)
Next
End Sub
Private Sub GetWdList_EachPDF(acroPDDoc As Object, PDFPath As String, docNum As Integer)
Dim jso As Object
Dim TotalPage As Long
Dim TotalWds As Long
Dim wdList As Variant
Dim wdCnt As Long
Dim quads As Variant
Dim lRet As Long
Dim i As Long, j As Long
lRet = acroPDDoc.Open(PDFPath)
Set jso = acroPDDoc.GetJSObject
TotalPage = jso.numpages
wdList = Array()
quads = Array()
wdCnt = 0
For i = 0 To TotalPage - 1
Application.StatusBar = "Getting PDF word list at page " & i + 1 & "/" & TotalPage & " on PDF file " & docNum
DoEvents
TotalWds = jso.getPageNumWords(i)
For j = 0 To TotalWds - 1
ReDim Preserve wdList(wdCnt)
wdList(wdCnt) = jso.getPageNthWord(i, j, False)
ReDim Preserve quads(wdCnt)
quads(wdCnt) = jso.getPageNthWordQuads(i, j)
wdCnt = wdCnt + 1
Next
Next
acroPDDoc.Close
Set jso = Nothing
End Sub
My PC is windows 10.
In Prc_1, the PDF file used is a PDF file of about 300 pages, and each page has 100-200 words .
In Prc_2, there are about 300 PDF files in the folder, each PDF file has 1 or 2 pages, and each page has 100-200 words.
(1)In Acrobat DC, no error occurs in Prc_1 and Prc_2.
But in the Prc_2, it always takes about 15 minutes to execute jso.getPageNumWords(i) when i=0,
and there is no problem after i=1.
This phenomenon doesn't occur at version 2021.001.20149, but after updating to 2021.001.20150 it occcurs
(2)In Acrobat XI(version 11.0.23), Prc_1 works OK, but in the middle of processing of Prc_2 is NG.
The "Automation error the remote procedure call failed" occurs.
It always occurs at the process when using the jso object to get the "PageNumWords" or "PageNthWordQuads".
I found it seems that Acrobat XI is closed for some reason during execution, and I get this error.
Is there something wrong with my code?
How can I solve these problem?
Please give me some advices.
