Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

Extract PDF API returns BAD_PDF_UNSUPPORTED_FONT

Community Beginner ,
Apr 15, 2024 Apr 15, 2024

I've created a PDF using Adobe InDesign and was attempting to extract the text from the PDF using the API.  However, I'm getting the following error:

  • Known exception encountered while executing operation ServiceApiError: BAD_PDF - Unable to extract content.: The input file contains font data that is corrupted or not supported

 

Then further down:

  • BAD_PDF_UNSUPPORTED_FONT

 

 

When I use the Export PDF API to turn the PDF file into a MSWord .docx file, then use Word to print to PDF, and try the Extract PDF API on the modified pdf file, I dont encounter the same problem.

Does anyone know of a way to make the Extract API more forgiving?  IE Allowing me to get the desired result without jumping through additional hoops?   Or why the original PDF has been generated in a way that the Extract API doesnt like the font metadata.

TOPICS
Node.js SDK , PDF Extract API , PDF Services API
2.4K
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

Community Beginner , Apr 15, 2024 Apr 15, 2024

An example PDF is attached.

Translate
Community Expert ,
Apr 15, 2024 Apr 15, 2024

Can you share the PDF in question?

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 15, 2024 Apr 15, 2024

Hi Joel,  I've added a copy of the knitting pattern called "flax sweater" as a separater reply to my original post.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 15, 2024 Apr 15, 2024

An example PDF is attached.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 17, 2024 Apr 17, 2024

There are, in fact, a ton of font errors in this PDF. Exporting to Word uses a different tool which is why it can be read.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Beginner ,
Apr 22, 2024 Apr 22, 2024

Thanks for taking a look Joel.  There's a couple of things that don't seem right to me about this:

  • Why would InDesign allow the creation of a PDF file with so many errors in it?
  • And why does the Extract API care about the errors?
    • It's primary purpose is to extract text from the PDF so refusing to handle a PDF that looks  valid to an end user doesnt feel like a robust enough system to be generally useful.
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Apr 23, 2024 Apr 23, 2024
LATEST

I wish I had answers for you. Fonts are really complicated.

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources