• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Unable to to extract text from drawing PDF with PDF Extract API

New Here ,
Oct 12, 2021 Oct 12, 2021

Copy link to clipboard

Copied

Hello,

default9ngizh1vietx_0-1634033728066.png

Highlighted portions in the above section of the PDF are vectors (selectebale text), but I am unable to extract any text data from this pdf.

 

Attached: the drawing PDF and the JSON result.

 

Thanks,

Adam.

TOPICS
PDF Extract API

Views

452

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Oct 12, 2021 Oct 12, 2021

Copy link to clipboard

Copied

The entirte page is being seen as a graphic so no text is being read. Do I have your permission to send this to our Engineering team as a sample file to train the AI?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Oct 12, 2021 Oct 12, 2021

Copy link to clipboard

Copied

LATEST

Sure, feel free to use this file.

 

Also, I have the same problem with any raster PDF files (scan pdfs), so I have tried first to run it through OCR API service and then I used the Extract API service, even though still no text is being read.

 

Is there is any workaround to optimize/convert raster PDF files (searchable), so the Extract API service will be able to recognize the text at the lower layer?

 

Thanks,

Adam.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources