Copy link to clipboard
Copied
I was reading from a forum thread: https://forums.adobe.com/thread/604177 and started experimenting with it. The libraries I have loaded include
The computer which this code is executed on have Acrobat Professional installed on it.
|
You need a few more lines. Take a look here for a working example: Adobe Acrobat and VBA - An Introduction - KHKonsulting LLC
Add these lines to the beginning of your program and see if that fixes it:
Dim AcroApp As Acrobat.CAcroApp
Set AcroApp = CreateObject("AcroExch.App")
And, all you need is a reference to the Adobe Acrobat 10.0 Type Library
I think you will find it very helpfull to write an Acrobat folder level JavaScript function for extracting the page text, and then call this function from the VBA script.
This will help in 2 big ways
1) Since you are using a JavaScript function to acquire the text, it is more efficient to develop the script in the native environment, where it is easy to debug and maintain.
2) The interface between VBA and Acrobat JavaScript is inefficient. It's slower than running the JS in it's native enviro
...Copy link to clipboard
Copied
You need a few more lines. Take a look here for a working example: Adobe Acrobat and VBA - An Introduction - KHKonsulting LLC
Add these lines to the beginning of your program and see if that fixes it:
Dim AcroApp As Acrobat.CAcroApp
Set AcroApp = CreateObject("AcroExch.App")
And, all you need is a reference to the Adobe Acrobat 10.0 Type Library
Copy link to clipboard
Copied
Ah! Thanks Karl. That is helpful. I got it to open just fine with the following.
My second question is then - what kind of object is the PDDoc considered as? Is it like a pointer or is it an actual object that contains all the data within the file? If I pass it as an object around, are there limitations on shallow passes and deep passes? (Say, I want to get the function to return a AcroPDDoc object and do other things with it).
Thanks!
Public Function GetPDF (FilePath As String) As Object
Dim ArcoApp As New Acrobat.AcroApp
Dim OriPdf As New Acrobat.AcroPDDoc
Set ArcoApp = CreateObject("AcroExch.App")
Set OriPdf = CreateObject("AcroExch.PDDoc")
If OriPdf.Open(FilePath) Then
MsgBox ("weee")
End If
GetPDF = OriPdf
OriPdf.Close
AcroApp.Close
Set OriPdf = Nothing
Set AcroApp = Nothing
End Function
Copy link to clipboard
Copied
Also, make sure you download the SDK and read the documentation. Snippets on the web are not documentation…
Copy link to clipboard
Copied
After running the code for the second time, the application took a long time to load and the error pointed to the CreateObject("AcroExch.App") line. The error was : Cannot create ActiveX component. Please advise.
Copy link to clipboard
Copied
That error is usually reported because the reference to the "Acrobat.tlb" file is missing.
Copy link to clipboard
Copied
Thank you very much 🙂
Copy link to clipboard
Copied
Thanks for your great article. I am hung up on this line of code from your post:
If Part1Document.InsertPages(numPages - 1, Part2Document, 0, Part2Document.GetNumPages(), True) = False Then MsgBox "Cannot insert pages" End If
I only want to copy the first two pages of the Part 2 Document, but every syntax variation I try gets me all the pages from Part 2 copied to Part 1.
Any ideas?
Copy link to clipboard
Copied
You say "syntax variation". Does that mean you don't have the documentation? It's here: https://opensource.adobe.com/dc-acrobat-sdk-docs/library/interapp/IAC_API_OLE_Objects.html#insertpag...
Copy link to clipboard
Copied
Hi, can you help me with this question? I am stuck, any help is highly appreciated, thanks
I am getting the error:
Run-time error '-2147023170 (800706be)':
Automation error
The remote procedure call failed.
in the line : For i = 0 To pdf_doc.GetNumPages - 1
This is my code:
Option Explicit
Public Const pdf_file As String = "C:\Users\... 10\2022 PY PAR.pdf"
Sub rad_from_pdf()
Dim aApp As Acrobat.AcroApp
Dim av_doc As CAcroAVDoc
Dim pdf_doc As CAcroPDDoc
Dim sel_text As CAcroPDTextSelect
Dim i As Long, j As Long
Dim pagenumber, pageContent, content
Set aApp = CreateObject("AcroExch.App")
Set av_doc = CreateObject("AcroExch.AVDoc")
If av_doc.Open(pdf_file, vbNull) <> True Then Exit Sub
While av_doc Is Nothing
Set av_doc = aApp.GetActiveDoc
Wend
Set pdf_doc = av_doc.GetPDDoc
For i = 0 To pdf_doc.GetNumPages - 1
Set pagenumber = pdf_doc.AcquirePage(i)
Set pageContent = CreateObject("AcroExch.HiliteList")
On Error Resume Next
If pageContent.Add(0, 9000) <> True Then Exit Sub
Set sel_text = pagenumber.CreatePageHilite(pageContent)
On Error GoTo 0
For j = 0 To sel_text.GetNumText - 1
Debug.Print sel_text.GetText(j)
Next
Next
av_doc.Close False
aApp.Exit
Set sel_text = Nothing
Set pagenumber = Nothing
Set pdf_doc = Nothing
Set av_doc = Nothing
Set aApp = Nothing
End Sub
Copy link to clipboard
Copied
Are you trying to read all the text from a PDF? The JavaScript GetPageNthWord is now considered the best way to do that.
Copy link to clipboard
Copied
Thanks for your answer and help!
Yes, need to get the text from a pdf, and paste it in excel to manipulate it with the vba macro. The old vba created for some one else was given to me to troubleshoot and make it more efficient as it became super slow, but in my machine I am not even able to run it as I get the automatiuon error when reaching the below:
If Not AC_PGTxt Is Nothing Then
With AC_PGTxt
For j = 0 To .GetNumText - 1
T_Str = T_Str & .GetText(j)
Next j
End With
End If
Copy link to clipboard
Copied
I think you will find it very helpfull to write an Acrobat folder level JavaScript function for extracting the page text, and then call this function from the VBA script.
This will help in 2 big ways
1) Since you are using a JavaScript function to acquire the text, it is more efficient to develop the script in the native environment, where it is easy to debug and maintain.
2) The interface between VBA and Acrobat JavaScript is inefficient. It's slower than running the JS in it's native environement and the is a fundamental incompatibility between complex types. Putting all the JS code in a single location will save you a lot of headaches.
Copy link to clipboard
Copied
Thanks a lot Thom Parket
Copy link to clipboard
Copied
Thanks again Thom,
Would you mind to pls provide a sample of the code needed? both javascript and how to call it from vba excel?
This is the intended output in excel:
Text In Page - 1 |
wqeqqwe |
ewqeqw |
qweqweqw |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
wqeqweq |
Text In Page - 2 |
ythythy |
kiukiu |
kuikiu |
kiukiu |
kuikui |
kuikui |
ioloi |
iloi |
loilio |
Copy link to clipboard
Copied
You'll find everything you need in the IAC reference:
Here's the page on using the IAC OLE interface, which is the Windows version. There are examples on this page for C# and VBA.
You'll find an example of what you need about halfway down under the title "Using the JSObject".
Copy link to clipboard
Copied
Thank Yiu Thom, much appreciated
Copy link to clipboard
Copied
Hi Thom P,
I am studying the documentation provided. Thanks
Is there a way I can extract per line? is there a method such as GetLine or something line that.
The JSO.getPageNthWord does not recognize "-", spaces, or Line Feed or CR or anything to determine the end of line.
Thank you
Copy link to clipboard
Copied
PDF files do not contain CR or LF. Often they don't even contain spaces. Each character is like a graphic, with its own position on the page. Acrobat has used guesswork to divide into words, based on the distance between characters. You can do your own guesswork to divide into lines - GetPageNthWordQuad is the tool you need. In the general case this is pretty hard; there may be main text, subscript, superscript, non-aligned columns and more. But you might be lucky with your usage case. But it's all guesswork.
Copy link to clipboard
Copied
THank you Thom!
Copy link to clipboard
Copied
To make sure special and white-space characters (if they exist!) are returned by getPageNthWord make sure to specify the bStrip parameter as false (by default it's set to true).
Copy link to clipboard
Copied
Got it, testing now, thank you Thom👍