Skip to main content
Participant
October 12, 2021
Question

Unable to to extract text from drawing PDF with PDF Extract API

  • October 12, 2021
  • 1 reply
  • 944 views

Hello,

Highlighted portions in the above section of the PDF are vectors (selectebale text), but I am unable to extract any text data from this pdf.

 

Attached: the drawing PDF and the JSON result.

 

Thanks,

Adam.

This topic has been closed for replies.

1 reply

Joel Geraci
Community Expert
Community Expert
October 12, 2021

The entirte page is being seen as a graphic so no text is being read. Do I have your permission to send this to our Engineering team as a sample file to train the AI?

Adam_LifAuthor
Participant
October 12, 2021

Sure, feel free to use this file.

 

Also, I have the same problem with any raster PDF files (scan pdfs), so I have tried first to run it through OCR API service and then I used the Extract API service, even though still no text is being read.

 

Is there is any workaround to optimize/convert raster PDF files (searchable), so the Extract API service will be able to recognize the text at the lower layer?

 

Thanks,

Adam.