Split large pdf on repeated text pattern, and save new pdf with custom filename
I have Acrobat Pro DC
I have a problem in my current organisation which uses a very old fashioned HR system for recruitment. Our HR system compiles one massive report of all the job applications for a recent post: the pdf is 1700+ pages long, containing distinct sections (of variable length) for over 200 applicants.
I want to split this into one pdf per applicant, with the filename of each document being the applicant's name.
For each new application, a consistently formatted divider page exists as follows:
Applicant : Smith, John
Vacancy ID : 15535
The text 'Vacancy ID' only exists on these divider pages, so it can be used to identify where to split the document.
The applicant's name, which occurs on a previous line, starts at character 10 and is variable length. In fact it can be acquired with getPageNthWord(page,3) and getPageNthWord(page,4)
How easy would it be to create some javascript to run in an action which would do the following:
- Identify text "Vacancy ID"
- Split document at that point, saving the pages from current page (typically 5, though not always) up to page before next instance of "Vacancy ID"
- Extract applicant name from previous line
- Save individual pdf for each applicant, using applicant name
Can this be done, or has it been done already? Thanks
