Skip to main content
Known Participant
July 20, 2017
Answered

Accessing a PDF file through VBA

  • July 20, 2017
  • 2 replies
  • 151714 views

0down votefavorite

I was reading from a forum thread: https://forums.adobe.com/thread/604177 and started experimenting with it.
But I think the function didn't load at all. I wasn't sure what might the reason be behind it -
I reckon it has to be simple and probably related to the library.
Can someone help point out why the following code failed to compile at all?
(the code appeared not to have ran upon execution in immediate as none of the breakpoints triggered)

The libraries I have loaded include

  • Acrobat Distiller
  • Adobe Acrobat 10.0 Type Library
  • Acrobat Scan 1.0 Type Library

The computer which this code is executed on have Acrobat Professional installed on it.

Public Function GetPDF() '(FilePath As String) As Object 
Dim origPdf As Acrobat.AcroPDDoc
Dim path1 As String
MsgBox
("Start") 

path1
= Application.ActiveWorkbook.Path
path1
= path1 & "/31700100" 

Set
origPdf = CreateObject("AcroExch.PDDoc") 

If
origPdf.Open(path1) Then
MsgBox
("weee")
End If 

origPdf
.Close
Set origPdf = Nothing
End Function
This topic has been closed for replies.
Correct answer Thom Parker

Thanks for your answer and help!

Yes, need to get the text from a pdf, and paste it in excel to manipulate it with the vba macro. The old vba created for some one else was given to me to troubleshoot and make it more efficient as it became super slow, but in my machine I am not even able to run it as I get the automatiuon error when reaching the below:

If Not AC_PGTxt Is Nothing Then

With AC_PGTxt

For j = 0 To .GetNumText - 1
T_Str = T_Str & .GetText(j)
Next j

End With

End If

 


I think you will find it very helpfull to write an Acrobat folder level JavaScript function for extracting the page text, and then call this function from the VBA script.

 

This will help in 2 big ways

1) Since you are using a JavaScript function to acquire the text, it is more efficient to develop the script in the native environment, where it is easy to debug and maintain. 

2) The interface between VBA and Acrobat JavaScript is inefficient. It's slower than running the JS in it's native environement and the is a fundamental incompatibility between complex types. Putting all the JS code in a single location will save you a lot of headaches. 

 

  

2 replies

Participating Frequently
February 1, 2023

Hi, can you help me with this question? I am stuck, any help is highly appreciated, thanks

I am getting the error:

Run-time error '-2147023170 (800706be)':

Automation error

The remote procedure call failed. 

in the line : For i = 0 To pdf_doc.GetNumPages - 1

This is my code:

Option Explicit
Public Const pdf_file As String = "C:\Users\... 10\2022 PY PAR.pdf"
Sub rad_from_pdf()
Dim aApp As Acrobat.AcroApp
Dim av_doc As CAcroAVDoc
Dim pdf_doc As CAcroPDDoc
Dim sel_text As CAcroPDTextSelect
Dim i As Long, j As Long
Dim pagenumber, pageContent, content
Set aApp = CreateObject("AcroExch.App")
Set av_doc = CreateObject("AcroExch.AVDoc")

If av_doc.Open(pdf_file, vbNull) <> True Then Exit Sub
While av_doc Is Nothing
Set av_doc = aApp.GetActiveDoc
Wend
Set pdf_doc = av_doc.GetPDDoc
For i = 0 To pdf_doc.GetNumPages - 1
Set pagenumber = pdf_doc.AcquirePage(i)
Set pageContent = CreateObject("AcroExch.HiliteList")

On Error Resume Next
If pageContent.Add(0, 9000) <> True Then Exit Sub

Set sel_text = pagenumber.CreatePageHilite(pageContent)
On Error GoTo 0

For j = 0 To sel_text.GetNumText - 1
Debug.Print sel_text.GetText(j)
Next
Next
av_doc.Close False
aApp.Exit
Set sel_text = Nothing
Set pagenumber = Nothing
Set pdf_doc = Nothing
Set av_doc = Nothing
Set aApp = Nothing
End Sub

 

Brainiac
February 2, 2023

Are you trying to read all the text from a PDF? The JavaScript GetPageNthWord is now considered the best way to do that.  

Participating Frequently
February 2, 2023

Thanks for your answer and help!

Yes, need to get the text from a pdf, and paste it in excel to manipulate it with the vba macro. The old vba created for some one else was given to me to troubleshoot and make it more efficient as it became super slow, but in my machine I am not even able to run it as I get the automatiuon error when reaching the below:

If Not AC_PGTxt Is Nothing Then

With AC_PGTxt

For j = 0 To .GetNumText - 1
T_Str = T_Str & .GetText(j)
Next j

End With

End If

 

Karl Heinz  Kremer
Brainiac
July 20, 2017

You need a few more lines. Take a look here for a working example: Adobe Acrobat and VBA - An Introduction - KHKonsulting LLC

Add these lines to the beginning of your program and see if that fixes it:

Dim AcroApp As Acrobat.CAcroApp

Set AcroApp = CreateObject("AcroExch.App")

And, all you need is a reference to the Adobe Acrobat 10.0 Type Library

New Participant
June 23, 2023

Thanks for your great article.  I am hung up on this line of code from your post:

 

If Part1Document.InsertPages(numPages - 1, Part2Document,
		0, Part2Document.GetNumPages(), True) = False Then
        MsgBox "Cannot insert pages"
    End If

I only want to copy the first two pages of the Part 2 Document, but every syntax variation I try gets me all the pages from Part 2 copied to Part 1.

Any ideas?
Brainiac
June 23, 2023

You say "syntax variation". Does that mean you don't have the documentation? It's here: https://opensource.adobe.com/dc-acrobat-sdk-docs/library/interapp/IAC_API_OLE_Objects.html#insertpages