Skip to main content
Participant
January 12, 2018
Question

extraction of TOC details/Splitting PDF as per TOC

  • January 12, 2018
  • 4 replies
  • 1814 views

I have some PDF files and would like to split them into different pdfs as per the TOC given in the file. Using JavaScript, would like to create an action which can read TOC from the pages it is available upto and then split the files as per the TOC.

I am new to JS on acrobat. Any help would be appreciated!!

Thanks,

This topic has been closed for replies.

4 replies

Thom Parker
Community Expert
Community Expert
January 15, 2018

I just recently wrote a script that parses the TOC out of a PDF and builds a set of matching bookmarks.  In the past I've also written a plug-in, in which part of it's functionality was to find and read a TOC.  There were several issues with this process.

  1. The format/layout of the TOC varies wildly across documents. If you want to do this with JS on your docs, they need to have a very consistent format. 
  2. The TOC starting and ending page numbers need to be known up front. Either the location needs to be consistent, or the user will need enter this data.
  3. TOC page numbers do not necessarily match real page numbers, and there are often lettered sections such as A1 or ii. If this is the case with your PDFs, then the script will also need to search all the pages for the related anchor numbers.

Message me if you would like some consulting/development on this topic.

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
Participant
January 16, 2018

Hello Thom,

The file format is consistent and the start page numbers are written in front of the title ( right hand side of the document) in numeric format. The length of TOC changes though. I have developed a python script which can split the document; however, I need to do it in Acrobat JavaScipt so that the same can be added to the Actions of Acrobat.

Thanks,

Bhoopendra S

Thom Parker
Community Expert
Community Expert
January 16, 2018

Well then, all you need to parse the bookmarks and detect the page numbers is the "this.getPageNthWord" and "this.getPageNthWordQuad" functions.  These give you the word and the words location on the page. Be warned, words are not necessarily returned in the order they appear on the page.  Usually they do, but not always.  I always sort the words into lines, and then order the lines.

Here's the SDK reference for the functions:

Acrobat DC SDK Documentation

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
JR Boulay
Community Expert
Community Expert
January 13, 2018

Sorry, I misunderstood the question.

Acrobate du PDF, InDesigner et Photoshopographe
JR Boulay
Community Expert
Community Expert
January 13, 2018

You should use the Acrobat Pro "Split" feature in an Action (Action Wizard) or in a Custom Command.

Acrobate du PDF, InDesigner et Photoshopographe
Bernd Alheit
Community Expert
Community Expert
January 13, 2018

This is possible when there are bookmarks in the file.

try67
Community Expert
Community Expert
January 12, 2018

There is no "out of the box" solution for this. It will have to be custom-developed to match the structure of the TOC in your files.

I've developed similar scripts in the past and would be happy to take a look at a sample file and let you know if I think it's doable or not, and if so for how much. You can contact me privately (try6767 at gmail.com) to discuss it further.

Participant
January 12, 2018

try67​: Thanks, will contact you on the given mail id.