Extracting content from PDF file - Issue with symbols in output

Forum|Forum|8 years ago
March 13, 2018
1 reply
978 views

Hi,

I used the library provided by datalogic(acrobat's) in my Java Program to extract the content from a pdf file, but after extraction, the content has symbols and non understandable fonts. The same program works fine when I extract the content of another pdf file.

Can you please let me know how to read the exact content from the pdf file avoiding the symbols and various non understandable fonts.

Thanks,

Mahesh

Acrobat SDK and JavaScript

This topic has been closed for replies.

Correct answer Test Screen Name

I replied to your duplicate post.

Duplicate post: Acrobat XI Pro: -> Need SDK/API for programmatic process.

Response edited by Forum Moderator.

lrosenth

Adobe Employee

Since you licensed the library from Datalogics, you should contact them for support – it’s included in your contract

A

Anonymous

Hi Team,

Thank you for your reply.

Could you please go through below points and help me understand how can I proceed further.

1) We've Acrobat Adobe Pro tool which is used to convert pdf files of any format(encrypted, protected) into PLAIN readable pdf files.

2) And then we will use these readable pdf files as inputs to our java programs and extract the content from them.

3) When I use our JAVA program to extract the content from these kind of encrypted PDF files, we're getting the output text in symbols and non recognized fonts. So I hope if I use your "SDK\Library\jar" while extracting the content from PDF files, I'll get the content in readable text format. Even when I'm copying the content from pdf and pasting in a text file, the same thing is happening, the pasted content will have symbols and non recognized fonts.

4) We'd be happy if you can let us know about the SDK\Library\jar which Acrobat Adobe PRO is using to convert these encrypted PDF files into PLAIN PDF files, so that we can use the same library in our JAVA programs to convert the encrypted PDF files into plain PDF files and then extract the content in text format.

5) I can see SDK in JAVASCRIPT here "https://www.adobe.com/devnet/acrobat/sdk/eula.html", but could not find it in JAVA. Also, I’m not sure whether this will be useful to meet my requirement.

Requesting your assistance at the earliest possible, as it’s very urgent. Thanks in advance

Thanks,

Mahesh

T

Test Screen NameCorrect answer

Legend

I replied to your duplicate post.

Duplicate post: Acrobat XI Pro: -> Need SDK/API for programmatic process.

Response edited by Forum Moderator.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded