Skip to main content
Participant
April 3, 2023
Question

How can I prevent Adobe PDF to DOCX API from using OCR on some parts of the document (logo, map)

  • April 3, 2023
  • 0 replies
  • 215 views

Hello,

I am using the "PDF Services Java SDK" (https://github.com/adobe/pdfservices-java-sdk-samples) to create a DOCX from a PDF document.

It mostly works, but some elements are transformed to text automatically (OCR) when it shouldn't.

For instance, we have a logo in the top left corner:

 

 

And sometimes, Adobe Services API tries to use OCR on it, which results in garbage characters:

 

Same goes for images of maps, which contain texts (texts, numbers) that the Adobe API messes up:

 

In the API, I don't see any options to prevent OCR on some elements or even to prevent OCR altogether.

The only option is the ability to pass a com.adobe.pdfservices.operation.pdfops.options.exportpdf.ExportPDFOptions objects, on which you can only set the preferred language for OCR:

https://opensource.adobe.com/pdfservices-java-sdk-samples/apidocs/latest/index.html?com/adobe/pdfservices/operation/pdfops/options/exportpdf/ExportPDFOptions.html

 

Is there any way to prevent this behaviour ? 

 

Thanks,

Fabien

 

 

    This topic has been closed for replies.