Skip to main content
March 13, 2018
Answered

Extracting content from PDF file - Issue with symbols in output

  • March 13, 2018
  • 1 reply
  • 974 views

Hi,

I used the library provided by datalogic(acrobat's) in my Java Program to extract the content from a pdf file, but after extraction, the content has symbols and non understandable fonts. The same program works fine when I extract the content of another pdf file.

Can you please let me know how to read the exact content from the pdf file avoiding the symbols and various non understandable fonts.

Thanks,

Mahesh

This topic has been closed for replies.
Correct answer Test Screen Name

I replied to your duplicate post.

Duplicate post: Acrobat XI Pro: -> Need SDK/API for programmatic process.

Response edited by Forum Moderator.

1 reply

lrosenth
Adobe Employee
Adobe Employee
March 13, 2018

Since you licensed the library from Datalogics, you should contact them for support – it’s included in your contract

March 18, 2018

Hi Team,

Thank you for your reply.

Could you please go through below points and help me understand how can I proceed further.

1) We've Acrobat Adobe Pro tool which is used to convert pdf files of any format(encrypted, protected) into PLAIN readable pdf files.

2) And then we will use these readable pdf files as inputs to our java programs and extract the content from them.

3) When I use our JAVA program to extract the content from these kind of encrypted PDF files, we're getting the output text in symbols and non recognized fonts. So I hope if I use your "SDK\Library\jar" while extracting the content from PDF files, I'll get the content in readable text format. Even when I'm copying the content from pdf and pasting in a text file, the same thing is happening, the pasted content will have symbols and non recognized fonts.

4) We'd be happy if you can let us know about the SDK\Library\jar  which Acrobat Adobe PRO is using to convert these encrypted PDF files into PLAIN PDF files, so that we can use the same library in our JAVA programs to convert the encrypted PDF files into plain PDF files and then extract the content in text format.

5) I can see SDK in JAVASCRIPT here "https://www.adobe.com/devnet/acrobat/sdk/eula.html", but could not find it in JAVA. Also, I’m not sure whether this will be useful to meet my requirement.

Requesting your assistance at the earliest possible, as it’s very urgent. Thanks in advance

Thanks,

Mahesh

Test Screen NameCorrect answer
Legend
March 18, 2018

I replied to your duplicate post.

Duplicate post: Acrobat XI Pro: -> Need SDK/API for programmatic process.

Response edited by Forum Moderator.