Skip to main content
Participating Frequently
March 25, 2023
Question

Entire documents are nothing but images

  • March 25, 2023
  • 2 replies
  • 3201 views

Hi,

 

I was trying to extract the document attached to this question, when I try to extract it with Adobe APIs I just get a collection of images, one for each page.

I think this has to do with document parsing (probably parsed as a svg) and I don't know how to solve it!

 

Can sombody help me?

Thanks,

Giovanni

    2 replies

    Participant
    March 22, 2025

    Hi Giovanni,

    It sounds like the document is being processed as an image-based PDF rather than a text-based one. This often happens when the original document was scanned or created in a way that embeds text as part of images.

    You might need to use OCR (Optical Character Recognition) to extract the text properly. Adobe APIs have OCR capabilities, or you can try alternative tools specialized in document processing.

    If you're handling document-related tasks in a business setting, you might find useful resources at wagner-inkassoservice.de (https://www.wagner-inkassoservice.de/).

    Hope this helps!

    Raymond Camden
    Community Manager
    Community Manager
    March 29, 2023

    To be cleare, are you using the Extract API? It doesn't return just an image, but JSON, and optionally included images in the PDF (as well as other stuff).

    Participating Frequently
    March 29, 2023

    Yes, I am using the Extract API.

    I only get the image data from each page, can you send me the JSON you are talking about?

    thanks in advance,
    Giovanni

    Raymond Camden
    Community Manager
    Community Manager
    March 29, 2023

    What you are describing is impossible. 🙂 Our Extract API returns a zip file. The zip file _always_ contains structuredData.json. It *optionally* includes images and tables.