Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Add reference to Adobe Acrobat Type Library in Eclipse Java Project

Engaged ,
Jul 18, 2019 Jul 18, 2019

I know how to develop Windows Application in Visual Studio to to control Adobe Acrobat and PDF Documents using OLE Automation.

I am referring to page 21 in this guide:

https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/iac_developer_guide.pdf

I have done that many times in the past and the result was 100% successful.

I need to do the same but using Java and Eclipse.

My ultimate objective is to be able to extract text from a flattened PDF which is an appraisal form. So the Form has Fields and Values in a flattened PDF, and it follows a strict and fixed layout.

So, I want to write a Windows Desktop Application that will open a flattened PDF, find the field caption, and jump to the field value, extract the text. I've done some research, and so far I realized that I have to use the Doc method "getPageNthWord()" using OLE JSObject in Java.

I was able to use this code sample in the console window to extract the text of the current page:

var len = this.getPageNumWords(this.pageNum);

var txt="";

for (var i=0; i<len; i++) {

var w = this.getPageNthWord(this.pageNum, i);

txt += w + " ";

}

txt;

Questions:

- How I can add a reference to the Acrobat Library in Java Project in Eclipse.

- Is there any other method other than "gerPageNthWord()" that I can use to perform scraping to extract the text from PDF. I was expecting to find a method to extract a paragraph or the complete text of a given page.

Any help would be greatly appreciated.

Tarek

TOPICS
Acrobat SDK and JavaScript
4.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

LEGEND , Jul 19, 2019 Jul 19, 2019

Suggestion: forget Java. Use VB.

Translate
Community Expert ,
Jul 19, 2019 Jul 19, 2019

The Adobe PDF Library has a Java interface:

https://dev.datalogics.com/adobe-pdf-library/

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 19, 2019 Jul 19, 2019

From Bernd’s reply it may not be clear, but the Adobe PDF Library is a separate product with a a separate price tag. You can license it via DataLogics:

https://www.datalogics.com/products/pdf/pdflibrary/

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jul 19, 2019 Jul 19, 2019

https://forums.adobe.com/people/Bernd+Alheit  wrote

The Adobe PDF Library has a Java interface:

https://dev.datalogics.com/adobe-pdf-library/

Thanks a lot. All the information I need, except for Java, which probably no need to consider anyway.

Tarek

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 20, 2019 Jul 20, 2019

There are several Java libraries for processing PDF files. If you only need the entire page contents that shouldn't be too difficult.
If you need to access specific words in specific locations it becomes much (much) more complicated, though.

I have developed tools that can do it using PDFBox (a free, open-source Java PDF library), so if you're interested in purchasing something like that, feel free to contact me privately (via try6767 at gmail.com).

If you just need the full page contents I'm happy to direct you to an example of how to do it using PDFBox.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jul 22, 2019 Jul 22, 2019

try67  wrote

There are several Java libraries for processing PDF files. If you only need the entire page contents that shouldn't be too difficult.
If you need to access specific words in specific locations it becomes much (much) more complicated, though.

I have developed tools that can do it using PDFBox (a free, open-source Java PDF library), so if you're interested in purchasing something like that, feel free to contact me privately (via try6767 at gmail.com).

If you just need the full page contents I'm happy to direct you to an example of how to do it using PDFBox.

If you have a flattened PDF that represents an Application Form, does the method you mentioned (advanced tools) will help find the fields on the application form, and get the data of the field?

Remember that the field can be "Checkbox", "Radiobutton", Drop-Down List, Multi Selection List.

I am making an assumption that with the tool you mentioned, we need to configure the scrapping process to indicate the parts of the form which has fields, and what is the field type.

Can you provide some more details?

See example of a form that we need to scarp.

Sample Form Image.png

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 22, 2019 Jul 22, 2019

No, it won't work with anything but text, if the fields have been flattened.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jul 22, 2019 Jul 22, 2019

If you want you can send me a sample file, though, and I'll see what I can extract from it, but I'm not very hopeful, based on what you shared...

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jul 22, 2019 Jul 22, 2019
LATEST

Thanks anyway. I will discuss and come back if needed.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 19, 2019 Jul 19, 2019

All text extraction in Adobe Interfaces starts with words. Paragraphs only exist in our perfection so you need to use guesswork and fuzzy logic.

If if you want to use JSObject I recommend you use VB. Converting this to another platform will use a lot of your time.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Engaged ,
Jul 19, 2019 Jul 19, 2019

What is your solution or recommendation?

Please provide details.

Tarek

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Jul 19, 2019 Jul 19, 2019

Suggestion: forget Java. Use VB.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines