Scrape PDF based on text criteria
I have a document set of about 900 PDFs. They are structured like court documents, with the Title of each document "Motion in Limine to Exclude Evidence, Testimony and Reference ...." in a table on page 1 of each document.
I need a way to find instances of "Motion in Limine" or "MIL" on page 1 of any of the PDFs that are in this file directory. Next, return the next 50 words in the title, so I can see what the "Motion in Limine" was about and who filed it. Lastly, give me the date this document was "e-served" which is always date-stamped at the top of the PDF. Spit all this out into a clean delimited file.
Any suggestions for how I go about this?
