I was reading from a forum thread: https://forums.adobe.com/thread/604177 and started experimenting with it.
The libraries I have loaded include
The computer which this code is executed on have Acrobat Professional installed on it.
You need a few more lines. Take a look here for a working example: Adobe Acrobat and VBA - An Introduction - KHKonsulting LLC
Add these lines to the beginning of your program and see if that fixes it:
Dim AcroApp As Acrobat.CAcroApp
Set AcroApp = CreateObject("AcroExch.App")
And, all you need is a reference to the Adobe Acrobat 10.0 Type Library
Ah! Thanks Karl. That is helpful. I got it to open just fine with the following.
My second question is then - what kind of object is the PDDoc considered as? Is it like a pointer or is it an actual object that contains all the data within the file? If I pass it as an object around, are there limitations on shallow passes and deep passes? (Say, I want to get the function to return a AcroPDDoc object and do other things with it).
Public Function GetPDF (FilePath As String) As Object
Dim ArcoApp As New Acrobat.AcroApp
Dim OriPdf As New Acrobat.AcroPDDoc
Set ArcoApp = CreateObject("AcroExch.App")
Set OriPdf = CreateObject("AcroExch.PDDoc")
If OriPdf.Open(FilePath) Then
GetPDF = OriPdf
Set OriPdf = Nothing
Set AcroApp = Nothing
Also, make sure you download the SDK and read the documentation. Snippets on the web are not documentation…
After running the code for the second time, the application took a long time to load and the error pointed to the CreateObject("AcroExch.App") line. The error was : Cannot create ActiveX component. Please advise.
That error is usually reported because the reference to the "Acrobat.tlb" file is missing.
Thank you very much 🙂
Thanks for your great article. I am hung up on this line of code from your post:
If Part1Document.InsertPages(numPages - 1, Part2Document, 0, Part2Document.GetNumPages(), True) = False Then MsgBox "Cannot insert pages" End If
I only want to copy the first two pages of the Part 2 Document, but every syntax variation I try gets me all the pages from Part 2 copied to Part 1.
Hi, can you help me with this question? I am stuck, any help is highly appreciated, thanks
I am getting the error:
Run-time error '-2147023170 (800706be)':
The remote procedure call failed.
in the line : For i = 0 To pdf_doc.GetNumPages - 1
This is my code:
Public Const pdf_file As String = "C:\Users\... 10\2022 PY PAR.pdf"
Dim aApp As Acrobat.AcroApp
Dim av_doc As CAcroAVDoc
Dim pdf_doc As CAcroPDDoc
Dim sel_text As CAcroPDTextSelect
Dim i As Long, j As Long
Dim pagenumber, pageContent, content
Set aApp = CreateObject("AcroExch.App")
Set av_doc = CreateObject("AcroExch.AVDoc")
If av_doc.Open(pdf_file, vbNull) <> True Then Exit Sub
While av_doc Is Nothing
Set av_doc = aApp.GetActiveDoc
Set pdf_doc = av_doc.GetPDDoc
For i = 0 To pdf_doc.GetNumPages - 1
Set pagenumber = pdf_doc.AcquirePage(i)
Set pageContent = CreateObject("AcroExch.HiliteList")
On Error Resume Next
If pageContent.Add(0, 9000) <> True Then Exit Sub
Set sel_text = pagenumber.CreatePageHilite(pageContent)
On Error GoTo 0
For j = 0 To sel_text.GetNumText - 1
Set sel_text = Nothing
Set pagenumber = Nothing
Set pdf_doc = Nothing
Set av_doc = Nothing
Set aApp = Nothing
Thanks for your answer and help!
Yes, need to get the text from a pdf, and paste it in excel to manipulate it with the vba macro. The old vba created for some one else was given to me to troubleshoot and make it more efficient as it became super slow, but in my machine I am not even able to run it as I get the automatiuon error when reaching the below:
If Not AC_PGTxt Is Nothing Then
For j = 0 To .GetNumText - 1
T_Str = T_Str & .GetText(j)
This will help in 2 big ways
Thanks a lot Thom Parket
Thanks again Thom,
This is the intended output in excel:
|Text In Page - 1|
|Text In Page - 2|
You'll find everything you need in the IAC reference:
Here's the page on using the IAC OLE interface, which is the Windows version. There are examples on this page for C# and VBA.
You'll find an example of what you need about halfway down under the title "Using the JSObject".
Thank Yiu Thom, much appreciated
Hi Thom P,
I am studying the documentation provided. Thanks
Is there a way I can extract per line? is there a method such as GetLine or something line that.
The JSO.getPageNthWord does not recognize "-", spaces, or Line Feed or CR or anything to determine the end of line.
PDF files do not contain CR or LF. Often they don't even contain spaces. Each character is like a graphic, with its own position on the page. Acrobat has used guesswork to divide into words, based on the distance between characters. You can do your own guesswork to divide into lines - GetPageNthWordQuad is the tool you need. In the general case this is pretty hard; there may be main text, subscript, superscript, non-aligned columns and more. But you might be lucky with your usage case. But it's all guesswork.
THank you Thom!
To make sure special and white-space characters (if they exist!) are returned by getPageNthWord make sure to specify the bStrip parameter as false (by default it's set to true).
Got it, testing now, thank you Thom👍