Add reference to Adobe Acrobat Type Library in Eclipse Java Project

Engaged ,
Jul 18, 2019 Jul 18, 2019

Copy link to clipboard

Copied

I know how to develop Windows Application in Visual Studio to to control Adobe Acrobat and PDF Documents using OLE Automation.

I am referring to page 21 in this guide:

https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/iac_developer_guide.pdf

I have done that many times in the past and the result was 100% successful.

I need to do the same but using Java and Eclipse.

My ultimate objective is to be able to extract text from a flattened PDF which is an appraisal form. So the Form has Fields and Values in a flattened PDF, and it follows a strict and fixed layout.

So, I want to write a Windows Desktop Application that will open a flattened PDF, find the field caption, and jump to the field value, extract the text. I've done some research, and so far I realized that I have to use the Doc method "getPageNthWord()" using OLE JSObject in Java.

I was able to use this code sample in the console window to extract the text of the current page:

var len = this.getPageNumWords(this.pageNum);

var txt="";

for (var i=0; i<len; i++) {

var w = this.getPageNthWord(this.pageNum, i);

txt += w + " ";

}

txt;

Questions:

- How I can add a reference to the Acrobat Library in Java Project in Eclipse.

- Is there any other method other than "gerPageNthWord()" that I can use to perform scraping to extract the text from PDF. I was expecting to find a method to extract a paragraph or the complete text of a given page.

Any help would be greatly appreciated.

Tarek

TOPICS
Acrobat SDK and JavaScript

Views

816

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct Answer

Most Valuable Participant , Jul 19, 2019 Jul 19, 2019
Suggestion: forget Java. Use VB.

Likes

Translate

Translate
Adobe Community Professional ,
Jul 19, 2019 Jul 19, 2019

Copy link to clipboard

Copied

The Adobe PDF Library has a Java interface:

https://dev.datalogics.com/adobe-pdf-library/

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Community Professional ,
Jul 19, 2019 Jul 19, 2019

Copy link to clipboard

Copied

From Bernd’s reply it may not be clear, but the Adobe PDF Library is a separate product with a a separate price tag. You can license it via DataLogics:

https://www.datalogics.com/products/pdf/pdflibrary/

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jul 19, 2019 Jul 19, 2019

Copy link to clipboard

Copied

https://forums.adobe.com/people/Bernd+Alheit  wrote

The Adobe PDF Library has a Java interface:

https://dev.datalogics.com/adobe-pdf-library/

Thanks a lot. All the information I need, except for Java, which probably no need to consider anyway.

Tarek

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Jul 20, 2019 Jul 20, 2019

Copy link to clipboard

Copied

There are several Java libraries for processing PDF files. If you only need the entire page contents that shouldn't be too difficult.
If you need to access specific words in specific locations it becomes much (much) more complicated, though.

I have developed tools that can do it using PDFBox (a free, open-source Java PDF library), so if you're interested in purchasing something like that, feel free to contact me privately (via try6767 at gmail.com).

If you just need the full page contents I'm happy to direct you to an example of how to do it using PDFBox.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jul 22, 2019 Jul 22, 2019

Copy link to clipboard

Copied

try67  wrote

There are several Java libraries for processing PDF files. If you only need the entire page contents that shouldn't be too difficult.
If you need to access specific words in specific locations it becomes much (much) more complicated, though.

I have developed tools that can do it using PDFBox (a free, open-source Java PDF library), so if you're interested in purchasing something like that, feel free to contact me privately (via try6767 at gmail.com).

If you just need the full page contents I'm happy to direct you to an example of how to do it using PDFBox.

If you have a flattened PDF that represents an Application Form, does the method you mentioned (advanced tools) will help find the fields on the application form, and get the data of the field?

Remember that the field can be "Checkbox", "Radiobutton", Drop-Down List, Multi Selection List.

I am making an assumption that with the tool you mentioned, we need to configure the scrapping process to indicate the parts of the form which has fields, and what is the field type.

Can you provide some more details?

See example of a form that we need to scarp.

Sample Form Image.png

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Jul 22, 2019 Jul 22, 2019

Copy link to clipboard

Copied

No, it won't work with anything but text, if the fields have been flattened.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Jul 22, 2019 Jul 22, 2019

Copy link to clipboard

Copied

If you want you can send me a sample file, though, and I'll see what I can extract from it, but I'm not very hopeful, based on what you shared...

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jul 22, 2019 Jul 22, 2019

Copy link to clipboard

Copied

LATEST

Thanks anyway. I will discuss and come back if needed.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Jul 19, 2019 Jul 19, 2019

Copy link to clipboard

Copied

All text extraction in Adobe Interfaces starts with words. Paragraphs only exist in our perfection so you need to use guesswork and fuzzy logic.

If if you want to use JSObject I recommend you use VB. Converting this to another platform will use a lot of your time.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jul 19, 2019 Jul 19, 2019

Copy link to clipboard

Copied

What is your solution or recommendation?

Please provide details.

Tarek

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Most Valuable Participant ,
Jul 19, 2019 Jul 19, 2019

Copy link to clipboard

Copied

Suggestion: forget Java. Use VB.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines