JS to Extract Multiple Pages and Name as Page Label
Copy link to clipboard
Copied
Good Morning! We have a group of employees that often taken a PDF that contains multiple pages (in the range of 200-500 pages) and extract them out into separate PDF's to load into a system. Often times, these large PDF's that they begin with hold page labels for each and every page, but when extracting, they lose these page labels and the individual files are named differently. What we're aiming to do is have a javascript that they can run to extract all pages out of a multi-paged PDF and in that process, it name the individual PDF files the same name as their page label.
I believe the two functions that we're dealing with are Doc.getPageLabel() and Doc.extractPages() but we're unsure as to how to tie this into a javascript that will do what we need it to do. Unfortunately, none of us have any experience with JS. I appreciate any and every ones help! Thank you!
Copy link to clipboard
Copied
From what context do you want to use this script? From a menu item? An Action? The JS Console?
Basically you have it right. The only thing missing is a loop that iterates over all the pages, extracting each one using its label.
The basic code would be something like this:
for (var p=0; p<this.numPages; p++) {
this.extractPages(p, p, this.path.replace(this.documentFileName, this.getPageLabel(p) + ".pdf"));
}
Copy link to clipboard
Copied
What ever would best suit this objective. I was assuming that it would need to be an 'Add-On Tool' but if there is a better way of approaching it, we are definitely open to advice. Thank you!
Copy link to clipboard
Copied
That's possible, but it would require a more complex script. If you have Acrobat Pro you can just create a new Action, put that code into it and then run it on your files directly. That's probably the easiest way of using it.
Copy link to clipboard
Copied
This totally worked for me, exact same situation. Thanks a ton!
Copy link to clipboard
Copied
how do you run it, where do you run it?
Copy link to clipboard
Copied
From the JS Console: Press Ctrl+J, paste the code into the console, select it with the mouse and press Ctrl+Enter to run it.
Copy link to clipboard
Copied
what is wrong with me?
Copy link to clipboard
Copied
You must select (with the mouse or keyboard) the full code before executing it.
Copy link to clipboard
Copied
Thank You! worked! is it possible make it without number ?
Copy link to clipboard
Copied
This is a part of the page label, most likely. Try replacing this part of the code:
this.getPageLabel(p)
With:
this.getPageLabel(p).replace(/^\[\d+\]\s/, "")
Copy link to clipboard
Copied
Thank you! it is worked.
Copy link to clipboard
Copied
Wow, that is what i was looking for.
Can you write it in this way please:
filename.pdf to become filename___pagelabel.pdf
Thank you so much.
Copy link to clipboard
Copied
Sure. Use this:
this.extractPages(p, p, this.path.replace(".pdf", "___" + this.getPageLabel(p) + ".pdf"));
Copy link to clipboard
Copied
Thank you so much!!
Copy link to clipboard
Copied
I need some clarification on the steps to extrac the pages and keep the page labels. Here is what i am doing, is this correct?
1) Open Pdf
2) Select "Organize Pages"
3) Highlight the pages i want to extract
4) Use Control J to pen up the Java window
5) input the script (see below)
for (var p=0; p<this.numPages; p++)
{
this.extractPages(p, p, this.path.replace(this.documentFileName, this.getPageLabel(p) + ".pdf"));
}
6) highlight the script and use CTRL Enter to run it.
I get the follwoing error:
RaiseError: The file may be read-only, or another user may have it open. Please save the document with a different name or in a different folder.
Doc.extractPages:3:Console undefined:Exec
===> The file may be read-only, or another user may have it open. Please save the document with a different name or in a different folder.
undefined
Below is a screenshot of the whole page. I have tried on multipole pdf's and get the same issue. What am i doing wrong?
Copy link to clipboard
Copied
Yes, that's it. It should work... Make sure the file is not located in a special folder, though, like the root drive folder (C:\), or something like C:\Windows, or a network folder. Also make sure you have full read/write permissions to that folder.
Copy link to clipboard
Copied
If it still doesn't work, go to Menu - Preferences - Security (Enhanced) and make sure that everything there is disabled.
Copy link to clipboard
Copied
The security was enabled. Disabling it worked. Thank you very much!!
Copy link to clipboard
Copied
I save my files in a folder on the desktop, the path is C:\Users\*user*\Desktop\Plan Conversions No matter what I do I continue to get the error in the screenshot below. Strangely though, some files from the PDF will be extracted with their appropriate names, but not the files I've chosen to extract, it seems to be random. It would make my work life a lot easier if I could manage to get this to work correctly - any insight would be helpful. Thank you for any additional information you can provide.
Copy link to clipboard
Copied
You need to make sure that:
- The new file-name is valid, ie. doesn't contain any characters that can't be used for a file-name, such as:
/ \ * :
For some reason, a script also can't save a file-name with a comma in it.
You must remove all of these characters from the string before you use it as the new file-name.
- The new file-name is not the same as the one of the original file, if you're trying to save the extracted pages in the same folder.
Copy link to clipboard
Copied
I saved this particular file as henderson.pdf - no strange characters. The pages being extracted are architectural plans all saved in one file with the individual pages being named something like A-101 Floor Plan. Could it be the dash in the page name causing the problem somehow? I wouldn't think so because this script worked for me once in a similar scenario, but I have not been able to use it again and I have not made any changes to my work flow since.
Copy link to clipboard
Copied
The issue is not with the original file name, but the one you're trying to use to save the pages, that is, the page label. If you could share the file I would be able to provide further help.
Copy link to clipboard
Copied
Please see below, and thank you for your assistance.
https://acrobat.adobe.com/id/urn:aaid:sc:US:b65f8c60-b5cb-4b7b-ba11-f7d96f5362b9
Copy link to clipboard
Copied
Pages 76-78 contain a forward-slash in the page name, which is causing it to fail. There might be other issues, but that's the first one I've encountered so I stopped there. I recommend you add this line before the extractPages command, so you could see where it got stuck in the loop:
console.println(p);
Remember the page numbers are 0-based, so if it shows "75" it means page 76...


-
- 1
- 2