Skip to main content
erich64645996
Participant
July 23, 2021
Question

Using the extract, not getting some text that is in the margin of the page.

  • July 23, 2021
  • 3 replies
  • 1499 views

Just for reference, on page 31 at the bottom is some text "Reference ID: 3610837"  It is not in the json output from the extraction API.  I have attached the original PDF as well as the json output of your tool.

    This topic has been closed for replies.

    3 replies

    Participant
    April 17, 2023

    Same issue here, a lot of the good stuff is in the headers and footers. We're in 2023 now, any idea when getting them would be possible?

    Joel Geraci
    Community Expert
    Community Expert
    July 23, 2021

    I think Extract API is interpreting that area as a footer and ignoring it. Unfortunately, there is no setting to force it to not do that. 

    erich64645996
    Participant
    July 25, 2021

    Not that it's a showstopper for us, but if that is the confirmed reason that the text is not being extracted from the footer, is there a chance in a future sprint/improvement cycle of the tool that a config option can be added to broaden the text search to the entire page?

    Joel Geraci
    Community Expert
    Community Expert
    July 26, 2021

    Great minds... I've already submitted that as a feature request. It'll be important for documents that have been bates numbered too.

    erich64645996
    Participant
    July 23, 2021

    I attached a Greenshot image capture of the text I'm referring to.