PDF Extract API Language Support

Forum|Forum|4 years ago
November 2, 2021
返信数 2.
4296 ビュー

Hi all,

I have an application which I use to convert PDF files in various languages, Sometimes some of the languages doesn't get converted properly, like Hebrew and few other languages, I'd like to know if there's a spicified list of supported languages so that I can know for sure beforehand sending the pdf.

Thank you.

このトピックへの返信は締め切られました。

D

Dilber226970221gze

Participant

Hello, I was trying to convert a korean document to english. When using without API (export option in adobe reader) it working fine. But when using API I'm getting garbage text (unrecognized text). Could someone help?? Please see my attachment

fileoutpart0.xlsx

Ben Vanderberg

Community Manager

Adobe PDF Extract API is currently optimized for English. PDF Extract API is a little different than export as Export PDF is simply converting the content into Word, PowerPoint, Excel, etc. PDF Extract API is taking the PDF and passing through Adobe Sensei AI services to provide understanding of the structure, what are headers, paragraphs, etc.

If you are looking for the equivalent of the Export PDF function in Adobe Acrobat DC, then you would use the Export PDF API: https://opensource.adobe.com/pdftools-sdk-docs/release/latest/howtos.html#export-pdf

Ben

D

Dilber226970221gze

Participant

Thank you for the quick response Ben. Thats really helpful, just to make it more clear, is python sdk available for export API ? I couldn't find the same.

Thanks,

Raymond Camden

Community Manager

Are you asking what the application supports? That's not ours (Adobe) - so I don't know if we can help here. Did I misunderstand your question?

A

Amila Ruwanpathirana作成者

Participating Frequently

Sorry, Ya I don't think I have phrased the question correctly,

What I would like to know is, Is there any specific languages that the PDF Extract API doesn't support?

For instance, when I send a PDF in Hebrew or Bengali language to the PDF Extract API the response I get have incorrect characters.

Attached bellow is one of the responses I got from the PDF Extract API which is from a PDF in Armenien language.

Thank you for the prompt response.

d35d5137-5c3b-4e0c-b703-5d73a0867d1f.zip

Raymond Camden

Community Manager

Thanks for clarifying. According to our docs:

Language: The API is currently optimized for English language content. Files containing content in other Latin languages should return good results, but may have issues with non-English punctuation.

So I think it's a bit up in the air in terms of what to expect. That being said, would you be able to share any PDFs we can use to help test?

サインアップ

ソーシャルログイン

コミュニティへログイン

ソーシャルログイン