How can I prevent Adobe PDF to DOCX API from using OCR on some parts of the document (logo, map)

Forum|Forum|3 years ago
April 3, 2023
0 답변들
216 조회

Hello,

I am using the "PDF Services Java SDK" (https://github.com/adobe/pdfservices-java-sdk-samples) to create a DOCX from a PDF document.

It mostly works, but some elements are transformed to text automatically (OCR) when it shouldn't.

For instance, we have a logo in the top left corner:

And sometimes, Adobe Services API tries to use OCR on it, which results in garbage characters:

Same goes for images of maps, which contain texts (texts, numbers) that the Adobe API messes up:

In the API, I don't see any options to prevent OCR on some elements or even to prevent OCR altogether.

The only option is the ability to pass a com.adobe.pdfservices.operation.pdfops.options.exportpdf.ExportPDFOptions objects, on which you can only set the preferred language for OCR:

https://opensource.adobe.com/pdfservices-java-sdk-samples/apidocs/latest/index.html?com/adobe/pdfservices/operation/pdfops/options/exportpdf/ExportPDFOptions.html

Is there any way to prevent this behaviour ?

Thanks,

Fabien

이 주제는 답변이 닫혔습니다.

가입하기

소셜 로그인

커뮤니티에 로그인

소셜 로그인

파일을 바이러스 검사 중입니다.

이 파일은 다운로드할 수 없습니다