• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Extract PDF API returns BAD_PDF_UNSUPPORTED_FONT

Community Beginner ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

I've created a PDF using Adobe InDesign and was attempting to extract the text from the PDF using the API.  However, I'm getting the following error:

  • Known exception encountered while executing operation ServiceApiError: BAD_PDF - Unable to extract content.: The input file contains font data that is corrupted or not supported

 

Then further down:

  • BAD_PDF_UNSUPPORTED_FONT

 

 

When I use the Export PDF API to turn the PDF file into a MSWord .docx file, then use Word to print to PDF, and try the Extract PDF API on the modified pdf file, I dont encounter the same problem.

Does anyone know of a way to make the Extract API more forgiving?  IE Allowing me to get the desired result without jumping through additional hoops?   Or why the original PDF has been generated in a way that the Extract API doesnt like the font metadata.

TOPICS
Node.js SDK , PDF Extract API , PDF Services API

Views

229

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Beginner , Apr 15, 2024 Apr 15, 2024

An example PDF is attached.

Votes

Translate

Translate
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

Can you share the PDF in question?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

Hi Joel,  I've added a copy of the knitting pattern called "flax sweater" as a separater reply to my original post.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 15, 2024 Apr 15, 2024

Copy link to clipboard

Copied

An example PDF is attached.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 17, 2024 Apr 17, 2024

Copy link to clipboard

Copied

There are, in fact, a ton of font errors in this PDF. Exporting to Word uses a different tool which is why it can be read.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 22, 2024 Apr 22, 2024

Copy link to clipboard

Copied

Thanks for taking a look Joel.  There's a couple of things that don't seem right to me about this:

  • Why would InDesign allow the creation of a PDF file with so many errors in it?
  • And why does the Extract API care about the errors?
    • It's primary purpose is to extract text from the PDF so refusing to handle a PDF that looks  valid to an end user doesnt feel like a robust enough system to be generally useful.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Apr 23, 2024 Apr 23, 2024

Copy link to clipboard

Copied

LATEST

I wish I had answers for you. Fonts are really complicated.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources