Skip to main content
February 18, 2022
Question

Adobe Extract API: Some text is extracted as images.

  • February 18, 2022
  • 1 reply
  • 470 views

I am developing a system that uses the Adobe Extract API to extract sentences from PDF file in JSON format.

When I hit the Adobe Extract API against the PDF file, parts of the document where was the text was extracted as image.


In the image below, red part is extracted as text (expect result).
But blue part was extracted as image (unexpected result).


This is the code I used:
https://github.com/adobe/pdftools-extract-node-sdk-samples/blob/main/src/extractpdf/extract-text-table-info-with-tables-structure-from-pdf.js

 

This is the PDF file. Page of the image above is on page 59:

JP-N-KP-EPI-2000063_イーケプラ経口剤IF_rev18.pdf (ucbjapan.com)

 

The language of the PDF is Japanese. The PDF file is not a scanned paper file, but a PDF file of digitally created data.

Is there any information or solution for this?

This topic has been closed for replies.

1 reply

February 18, 2022

Sorry, I might post in wrong forum.

 

I posted again in right forum below.

Adobe Extract API: Some text is extracted as image... - Adobe Support Community - 12760741

 

I hope this would be deleted.