• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

How can I prevent Adobe PDF to DOCX API from using OCR on some parts of the document (logo, map)

Community Beginner ,
Apr 03, 2023 Apr 03, 2023

Copy link to clipboard

Copied

Hello,

I am using the "PDF Services Java SDK" (https://github.com/adobe/pdfservices-java-sdk-samples) to create a DOCX from a PDF document.

It mostly works, but some elements are transformed to text automatically (OCR) when it shouldn't.

For instance, we have a logo in the top left corner:

fnicollet_0-1680517336982.png

 

 

And sometimes, Adobe Services API tries to use OCR on it, which results in garbage characters:

fnicollet_1-1680517377304.png

 

Same goes for images of maps, which contain texts (texts, numbers) that the Adobe API messes up:

fnicollet_2-1680517434138.png

 

In the API, I don't see any options to prevent OCR on some elements or even to prevent OCR altogether.

The only option is the ability to pass a com.adobe.pdfservices.operation.pdfops.options.exportpdf.ExportPDFOptions objects, on which you can only set the preferred language for OCR:

https://opensource.adobe.com/pdfservices-java-sdk-samples/apidocs/latest/index.html?com/adobe/pdfser...

 

Is there any way to prevent this behaviour ? 

 

Thanks,

Fabien

 

 

Views

174

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
no replies

Have something to add?

Join the conversation
Resources