splitting pdf at various pages and page ranges

Report · Mar 28, 2020

Hello - I need to split a 7500 page pdf into multiple files based on single and ranges of pages, e.g., 1829-1842, 1843-1846, 1935, 1998, 2014, 4492, 5605, etc...

Is there a way to do this without manually selecting all of the pages? I tried "split" tool; however, it is limited to # of pages, file size, or bookmarks (document doesnt have bookmarks). I also tried the "extract" tool; however, it pulls them out as individual pages in seperate files for each page, or as one pdf with those pages.

I feel like there should be a way to do this all at once, as it would seem like something that would come up often.

Thanks!

Report · Mar 28, 2020

May be possible with a script.

Report · Mar 28, 2020

You can use the doc.extractPages method: https://help.adobe.com/en_US/acrobat/acrobat_dc_sdk/2015/HTMLHelp/#t=Acro12_MasterBook%2FJS_API_Acro...

You should be able to find a number of sample scripts by doing a search.

Report · Mar 28, 2020

can you help me find one please? I have been trying to follow the tutorial but I can't even get the console to run anything...

Report · Mar 28, 2020

And here's a link to a good tutorial: https://acrobatusers.com/tutorials/print/extracting-pages-pdf-acrobat-javascript/

Report · Mar 28, 2020

Hi George, I'd completely forgotten about this article. Thanks for posting it 🙂

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 28, 2020

thanks!

Report · Mar 28, 2020

This is so frustrating I am trying to follow the tutorial and nothing works. I can't figure out how to run anything in the console the arrow is grey and the green circle does nothing...

Report · Mar 28, 2020

The doc.extractPages JavaScript method still works, but a lot of Acrobat's user interface is different than it was 12 years ago, so some of what's mentioned in the tutorial won't be the same. Maybe Thom will respond with more details since this has caught his attention.

Report · Mar 28, 2020

I figured out at least how to get the console to work, I have to press control+enter.

Report · Mar 28, 2020

He's got one for that too: https://acrobatusers.com/tutorials/javascript_console/

Report · Mar 28, 2020

edit - nevermind -

Report · Mar 28, 2020

The console if for more than checking exceptions and testing code.

Here's a video tutorial on how to use it:

https://www.pdfscripting.com/public/Free_Videos.cfm#JSIntro

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 28, 2020

So, now that you have the console worked out and can test some code, you need to develop a strategy for extracting random pages into a single file. Unfortunately the extractPages function is not going to be much help. Instead, you need to use the "doc.insertPages()" function. That's because "extractPages" creates the file for the extracted pages. Using this funciton more than once with the same file name will just overwrite the file.

What you can do is use it once for the first extraction, then insert pages into the new PDF for the susequent pages. You'll also need to develop a method for listing the pages that can be parsed and applied to the extraction/insertion fucntions.

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 28, 2020

Well, I do want them to be seperate files actually. I ended up doing it like this, and it worked, although I'm not sure it was any faster than the built in "organize pages" tool:

this.extractPages(1828, 1841, "Settlement Agreement.pdf");

this.extractPages(1842, 1846, "Order Approving Settlement.pdf");

this.extractPages(1934, 1934, "MR1.pdf");

this.extractPages(1997, 1997, "LS 18.pdf");

this.extractPages(2013, 2013, "LS 203.pdf");

this.extractPages(4491, 4491, "LS 207.pdf");

this.extractPages(5604, 5604, "LS 202.pdf");

this.extractPages(2039, 2039, "Injury Related to Explosion.pdf");

this.extractPages(2290, 2290, "Employment Agreement.pdf");

this.extractPages(3006, 3010, "Informal Hearing Memo.pdf");

this.extractPages(3109, 3127, "Dep. Dr. Collins.pdf");

this.extractPages(3198, 3201, "3199-3202.pdf");

this.extractPages(3203, 3206, "MR2.pdf");

this.extractPages(3925, 3925, "MR3.pdf");

this.extractPages(3961, 3962, "MR4.pdf");

this.extractPages(3980, 3981, "MR5.pdf");

this.extractPages(3990, 3992, "MR6.pdf");

this.extractPages(4739, 4739, "Timesheet.pdf");

this.extractPages(5034, 5044, "MR7.pdf");

this.extractPages(5323, 5327, "MR8.pdf");

this.extractPages(5349, 5351, "MR9.pdf");

this.extractPages(5399, 5480, "Dep. Jones.pdf");

Report · Mar 28, 2020

Excellent!! The console is perfect for this kind of one off scripting. However, if this is something that needs to be done frequenlty it might be worth figuring out how to automate it, i.e. searching the PDF to find the page ranges and the names for the extracted files.

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 28, 2020

well, a human has to go through it each time to get those page ranges and come up with what they want to name it, so basically I just told her to extract them as she goes next time and save them. Thanks for the help and feedback!

Report · Mar 28, 2020

You can have the person add sticky notes to the pages to extract, or even bookmarks, and then a script can pick those up and extract the marked pages using the name of the bookmark or the text of the sticky note.

Report · Mar 28, 2020

Test

Report · Mar 28, 2020

As mentioned, this can be done using a script. For example, I've developed a (paid-for) tool for Acrobat that does just that.

You can find it here: http://try67.blogspot.com/2011/04/acrobat-extract-non-sequential-pages.html

Report · Mar 28, 2020

Here's a (paid for) tool that extracts all pages that contains specific words into a single PDF.

https://www.pdfscripting.com/public/Search-and-Extract-Pages-Description.cfm

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 28, 2020

We're now on the second page of posts 😉

Did you see Try67's reply? He has a good idea, have the person go through the PDF and add sticky notes at the split points.

I have a better idea. If the split points are also the places with the titles, then use the rectangle markup to encircle the title text. This text will then be copied into the comment box for the rectangle. This can be picked up by a scirpt that extracts the pages into the appropieately named PDF.

This strategy still takes some manual work, but should save a lot of time.

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Report · Mar 29, 2020

Hi, yes thank you for the responsivness and willingness to help! However, the titles are not actually on the pages. There is a lot more I think we could be doing with adobe and I plan on exploring that in the future.