Skip to main content
Participant
May 13, 2022
Question

Extract Pdf API is not extracting text from headers

  • May 13, 2022
  • 3 replies
  • 2183 views

I have extracted the attached pdf file and got a json & excel tables.

The tables are extracted perfectly. (The attached pdf has removed sensitive numbers in the table)

But it seems to be ignoring the header (attached pic) completely,

including the part below.

"Purchase Order
#PO2017
2/2/2022"

 

Other simple tools have no problem extracting the text from the pdf file.

 

 

I think this issue is related to the link below.

"not extracting text from footer"

https://community.adobe.com/t5/document-services-apis-discussions/using-the-extract-not-getting-some-text-that-is-in-the-margin-of-the-page/m-p/12195824

 

the only difference is that mine is a header (a little bit long header)

 

Can anyone help with this?

This topic has been closed for replies.

3 replies

Legend
April 17, 2023

The Extract PDF API is nothing to do with Acrobat or Acrobat Reader (despite any name suggesting otherwise). That means it is outside our knowledge, and the chances of getting help here are very small.

Please try the "Acrobat Services API" forum https://community.adobe.com/t5/acrobat-services-api/ct-p/ct-Document-Cloud-SDK?page=1&sort=latest_replies&lang=all&tabid=all

Participant
April 17, 2023

Same here. Seems like something that's actively ignored in the parsing. A simple flag in the oprions to get the headers and footers would be great. Really a big issue here with what seems like an easy solution and even though the community seems to have been asking for it for a while, it still isn't available.

Participant
June 27, 2022

Hello,

Are you able to solve the issue? I am also having similar issue and not able to get information from the header. If no luch with this api have you explored any other tool which can solve this problem for us. Thanks