Is there a way to extract pages from a pdf that contain specific keywords using a script
I found a script that works with Adobe Acrobat Pro but would like to automate it with a batch script.
https://community.adobe.com/t5/acrobat-discussions/extracting-pages-based-on-matching-strings-in-acrobat-pro-dc-java-script/m-p/11708654#M291647
I have a customer pdf file that contains invoices. Some invoices have the word Bank Draft and some do not. The problem is that all these invoices with this word are all shuffled into the pdf with pages that do not contain this keyword.
So far the best method I can come up with is to create bookmarks in the pdf for all pages that contain the phrase "Bank Draft" then extract all the bookmarked pages and save them into a new pdf file and remove those pages from the original pdf and have 2 seperate pdf files. 1 pdf should now contain all invoices that have Bank Draft invoices and another pdf that does NOT contain any Bank Draft invoices. The original pdf may contain around 3,500 pages. Around 500 of them might contain the phrase "Bank Draft".
I think pdftk might work but not exactly sure how to implement it. I want to be able to do this using command line so I can integrate it into my other batch processing programs for automation purposes. I already figured out a way to use cpdf to remove the color logos from the pdf's which will be done before i try to extract the Bank Draft pages.
I guess I was hoping somebody knew of and could recommend a small pdf command line utility and could help with the proper command. Similar to what I did with cpdf I used cpdf -draft in.pdf -o out.pdf to remove images from the pdf. was asking if someone knew if pdftk would work for what I am trying to do
Thank You
