Skip to main content
Participating Frequently
November 2, 2021
Question

PDF Extract API Language Support

  • November 2, 2021
  • 2 replies
  • 4273 views

Hi all,

I have an application which I use to convert PDF files in various languages, Sometimes some of the languages doesn't get converted properly, like Hebrew and few other languages, I'd like to know if there's a spicified list of supported languages so that I can know for sure beforehand sending the pdf.

Thank you.

This topic has been closed for replies.

2 replies

Participant
January 17, 2022

Hello, I was trying to convert a korean document to english. When using without API (export option in adobe reader) it working fine. But when using API I'm getting garbage text (unrecognized text). Could someone help?? Please see my attachment

Ben Vanderberg
Community Manager
Community Manager
January 17, 2022
Adobe PDF Extract API is currently optimized for English. PDF Extract API is a little different than export as Export PDF is simply converting the content into Word, PowerPoint, Excel, etc. PDF Extract API is taking the PDF and passing through Adobe Sensei AI services to provide understanding of the structure, what are headers, paragraphs, etc.

If you are looking for the equivalent of the Export PDF function in Adobe Acrobat DC, then you would use the Export PDF API: https://opensource.adobe.com/pdftools-sdk-docs/release/latest/howtos.html#export-pdf

Ben
Participant
January 18, 2022

Thank you for the quick response Ben. Thats really helpful, just to make it more clear, is python sdk available for export API ? I couldn't find the same. 

 

Thanks,

Raymond Camden
Community Manager
Community Manager
November 2, 2021

Are you asking what the application supports? That's not ours (Adobe) - so I don't know if we can help here. Did I misunderstand your question?

Participating Frequently
November 2, 2021

Sorry, Ya I don't think I have phrased the question correctly,

What I would like to know is, Is there any specific languages that the PDF Extract API doesn't support?

For instance, when I send a PDF in Hebrew or Bengali language to the PDF Extract API the response I get have incorrect characters.

Attached bellow is one of the responses I got from the PDF Extract API which is from a PDF in Armenien language.

Thank you for the prompt response.

Raymond Camden
Community Manager
Community Manager
November 2, 2021

Thanks for clarifying. According to our docs:

 

Language: The API is currently optimized for English language content. Files containing content in other Latin languages should return good results, but may have issues with non-English punctuation.

 

So I think it's a bit up in the air in terms of what to expect. That being said, would you be able to share any PDFs we can use to help test?