Add reference to Adobe Acrobat Type Library in Eclipse Java Project

Report · Jul 18, 2019

I know how to develop Windows Application in Visual Studio to to control Adobe Acrobat and PDF Documents using OLE Automation.

I am referring to page 21 in this guide:

https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/iac_developer_guide.pdf

I have done that many times in the past and the result was 100% successful.

I need to do the same but using Java and Eclipse.

My ultimate objective is to be able to extract text from a flattened PDF which is an appraisal form. So the Form has Fields and Values in a flattened PDF, and it follows a strict and fixed layout.

So, I want to write a Windows Desktop Application that will open a flattened PDF, find the field caption, and jump to the field value, extract the text. I've done some research, and so far I realized that I have to use the Doc method "getPageNthWord()" using OLE JSObject in Java.

I was able to use this code sample in the console window to extract the text of the current page:

var len = this.getPageNumWords(this.pageNum);
var txt="";
for (var i=0; i<len; i++) {
 var w = this.getPageNthWord(this.pageNum, i);
 txt += w + " ";
}
txt;

Questions:

- How I can add a reference to the Acrobat Library in Java Project in Eclipse.

- Is there any other method other than "gerPageNthWord()" that I can use to perform scraping to extract the text from PDF. I was expecting to find a method to extract a paragraph or the complete text of a given page.

Any help would be greatly appreciated.

Tarek

Report · Jul 19, 2019

The Adobe PDF Library has a Java interface:

https://dev.datalogics.com/adobe-pdf-library/

Report · Jul 19, 2019

From Bernd’s reply it may not be clear, but the Adobe PDF Library is a separate product with a a separate price tag. You can license it via DataLogics:

https://www.datalogics.com/products/pdf/pdflibrary/

Report · Jul 19, 2019

https://forums.adobe.com/people/Bernd+Alheit wrote
The Adobe PDF Library has a Java interface:
https://dev.datalogics.com/adobe-pdf-library/

Thanks a lot. All the information I need, except for Java, which probably no need to consider anyway.

Tarek

Report · Jul 20, 2019

There are several Java libraries for processing PDF files. If you only need the entire page contents that shouldn't be too difficult.
If you need to access specific words in specific locations it becomes much (much) more complicated, though.

I have developed tools that can do it using PDFBox (a free, open-source Java PDF library), so if you're interested in purchasing something like that, feel free to contact me privately (via try6767 at gmail.com).

If you just need the full page contents I'm happy to direct you to an example of how to do it using PDFBox.

Report · Jul 22, 2019

try67 wrote
There are several Java libraries for processing PDF files. If you only need the entire page contents that shouldn't be too difficult.
If you need to access specific words in specific locations it becomes much (much) more complicated, though.
I have developed tools that can do it using PDFBox (a free, open-source Java PDF library), so if you're interested in purchasing something like that, feel free to contact me privately (via try6767 at gmail.com).
If you just need the full page contents I'm happy to direct you to an example of how to do it using PDFBox.

If you have a flattened PDF that represents an Application Form, does the method you mentioned (advanced tools) will help find the fields on the application form, and get the data of the field?

Remember that the field can be "Checkbox", "Radiobutton", Drop-Down List, Multi Selection List.

I am making an assumption that with the tool you mentioned, we need to configure the scrapping process to indicate the parts of the form which has fields, and what is the field type.

Can you provide some more details?

See example of a form that we need to scarp.

Report · Jul 22, 2019

No, it won't work with anything but text, if the fields have been flattened.

Report · Jul 22, 2019

If you want you can send me a sample file, though, and I'll see what I can extract from it, but I'm not very hopeful, based on what you shared...

Report · Jul 22, 2019

Thanks anyway. I will discuss and come back if needed.

Report · Jul 19, 2019

All text extraction in Adobe Interfaces starts with words. Paragraphs only exist in our perfection so you need to use guesswork and fuzzy logic.

If if you want to use JSObject I recommend you use VB. Converting this to another platform will use a lot of your time.

Report · Jul 19, 2019

What is your solution or recommendation?

Please provide details.

Tarek

Report · Jul 19, 2019

Suggestion: forget Java. Use VB.

Add reference to Adobe Acrobat Type Library in Eclipse Java Project

1 Correct answer

Photos