PDF Extract API Language Support

Community Beginner ,
Nov 02, 2021 Nov 02, 2021

Copy link to clipboard

Copied

Hi all,

I have an application which I use to convert PDF files in various languages, Sometimes some of the languages doesn't get converted properly, like Hebrew and few other languages, I'd like to know if there's a spicified list of supported languages so that I can know for sure beforehand sending the pdf.

Thank you.

TOPICS
PDF Extract API , PDF Services API

Views

371

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Nov 02, 2021 Nov 02, 2021

Copy link to clipboard

Copied

Are you asking what the application supports? That's not ours (Adobe) - so I don't know if we can help here. Did I misunderstand your question?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 02, 2021 Nov 02, 2021

Copy link to clipboard

Copied

Sorry, Ya I don't think I have phrased the question correctly,

What I would like to know is, Is there any specific languages that the PDF Extract API doesn't support?

For instance, when I send a PDF in Hebrew or Bengali language to the PDF Extract API the response I get have incorrect characters.

Attached bellow is one of the responses I got from the PDF Extract API which is from a PDF in Armenien language.

Thank you for the prompt response.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Nov 02, 2021 Nov 02, 2021

Copy link to clipboard

Copied

Thanks for clarifying. According to our docs:

 

Language: The API is currently optimized for English language content. Files containing content in other Latin languages should return good results, but may have issues with non-English punctuation.

 

So I think it's a bit up in the air in terms of what to expect. That being said, would you be able to share any PDFs we can use to help test?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 02, 2021 Nov 02, 2021

Copy link to clipboard

Copied

Thanks for the reply,

Yes I'd be happy to provide the PDF file for your development, but since that's not a public document is there a way that I can send it to you without posting it publicly?

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Nov 02, 2021 Nov 02, 2021

Copy link to clipboard

Copied

Please email them to jedimaster@adobe.com, with a short note referencing this thread as my memory is roughly on par with a cat.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Nov 02, 2021 Nov 02, 2021

Copy link to clipboard

Copied

I have mailed you the files, Please do keep me posted if there's in development in this regard.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 17, 2022 Jan 17, 2022

Copy link to clipboard

Copied

Hello, I was trying to convert a korean document to english. When using without API (export option in adobe reader) it working fine. But when using API I'm getting garbage text (unrecognized text). Could someone help?? Please see my attachment

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 17, 2022 Jan 17, 2022

Copy link to clipboard

Copied

Adobe PDF Extract API is currently optimized for English. PDF Extract API is a little different than export as Export PDF is simply converting the content into Word, PowerPoint, Excel, etc. PDF Extract API is taking the PDF and passing through Adobe Sensei AI services to provide understanding of the structure, what are headers, paragraphs, etc.

If you are looking for the equivalent of the Export PDF function in Adobe Acrobat DC, then you would use the Export PDF API: https://opensource.adobe.com/pdftools-sdk-docs/release/latest/howtos.html#export-pdf

Ben

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jan 17, 2022 Jan 17, 2022

Copy link to clipboard

Copied

Thank you for the quick response Ben. Thats really helpful, just to make it more clear, is python sdk available for export API ? I couldn't find the same. 

 

Thanks,

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 18, 2022 Jan 18, 2022

Copy link to clipboard

Copied

Yes, and currently, it's _only_ available for Extract, but you could use the REST API in Python to call other parts of our stuff. More info here: https://opensource.adobe.com/pdftools-sdk-docs/extract/latest/quickstarts.html#python

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Jan 18, 2022 Jan 18, 2022

Copy link to clipboard

Copied

LATEST

Sorry - misread Export as Extract. Please disregard. Although what I said is still right, the Python SDK only supports Extract, but you could hit the REST APIs.

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources