Is there a way to extract pages from a pdf that contain specific keywords using a script

Question

I found a script that works with Adobe Acrobat Pro but would like to automate it with a batch script.

https://community.adobe.com/t5/acrobat-discussions/extracting-pages-based-on-matching-strings-in-acrobat-pro-dc-java-script/m-p/11708654#M291647
I have a customer pdf file that contains invoices. Some invoices have the word Bank Draft and some do not. The problem is that all these invoices with this word are all shuffled into the pdf with pages that do not contain this keyword.

So far the best method I can come up with is to create bookmarks in the pdf for all pages that contain the phrase "Bank Draft" then extract all the bookmarked pages and save them into a new pdf file and remove those pages from the original pdf and have 2 seperate pdf files. 1 pdf should now contain all invoices that have Bank Draft invoices and another pdf that does NOT contain any Bank Draft invoices. The original pdf may contain around 3,500 pages. Around 500 of them might contain the phrase "Bank Draft".

I think pdftk might work but not exactly sure how to implement it. I want to be able to do this using command line so I can integrate it into my other batch processing programs for automation purposes. I already figured out a way to use cpdf to remove the color logos from the pdf's which will be done before i try to extract the Bank Draft pages.

I guess I was hoping somebody knew of and could recommend a small pdf command line utility and could help with the proper command. Similar to what I did with cpdf I used cpdf -draft in.pdf -o out.pdf to remove images from the pdf. was asking if someone knew if pdftk would work for what I am trying to do

Thank You

try67 · Answer

This can be done in various ones. Searching a large file with a script is not very efficient, though, so it's better to rely on Acrobat's built-in search function, if possible. In Acrobat Pro you can use the Advanced Search command to locate all instances of a search term and then export those results to a CSV file.

This (paid-for) tool I've created can then take that CSV file and use the data in it to locate all the pages where matches were found and extract them to a new file in a matter of seconds. You can find it here: https://www.try67.com/tool/acrobat-print-or-extract-pages-from-csv-search-results

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded