Skip to main content
Participant
June 3, 2025
Question

Simpler extract PDF?

  • June 3, 2025
  • 0 replies
  • 101 views

Hi, I am trying to extract all the text from a PDF.  I was able to use ExtractPDFJob.  The result is way more than I need.  It was a large file with coordinates, bounds etc when all I need is the text itself and ideally the page number.

So something like

{ [ {page: "0", text: "Now is the time to live to the fullest"}, {page "1", etc}]

Is anything like this possible?  I could of course maybe parse what is given to me now but worried about size of file as this processing is in an AWS Lamba