Skip to main content
Participant
June 17, 2024
Question

Scrape PDF based on text criteria

  • June 17, 2024
  • 2 replies
  • 467 views

I have a document set of about 900 PDFs. They are structured like court documents, with the Title of each document "Motion in Limine to Exclude Evidence, Testimony and Reference ...." in a table on page 1 of each document.

 

I need a way to find instances of "Motion in Limine" or "MIL" on page 1 of any of the PDFs that are in this file directory. Next, return the next 50 words in the title, so I can see what the "Motion in Limine" was about and who filed it. Lastly, give me the date this document was "e-served" which is always date-stamped at the top of the PDF. Spit all this out into a clean delimited file.

 

Any suggestions for how I go about this?

This topic has been closed for replies.

2 replies

Legend
August 2, 2024

Hi @lee anno71488591 

Have you tried GenAI: https://www.adobe.com/acrobat/generative-ai-pdf.html multi-doc feature.

~Tariq

try67
Community Expert
Community Expert
June 18, 2024

This is possible, but will require quite a complex, custom-made script, probably in combination with an Action (to locate those texts initially, since scanning such a large file will probably be too much for a script to hand on its own).

 

I've developed similar tools for my clients in the past and would be happy to create one for you as well (for a fee, of course). Feel free to contact me privately by clicking my user-name and then on "Send a Message" to discuss it further.