JS to Extract Multiple Pages and Name as Page Label
Copy link to clipboard
Copied
Good Morning! We have a group of employees that often taken a PDF that contains multiple pages (in the range of 200-500 pages) and extract them out into separate PDF's to load into a system. Often times, these large PDF's that they begin with hold page labels for each and every page, but when extracting, they lose these page labels and the individual files are named differently. What we're aiming to do is have a javascript that they can run to extract all pages out of a multi-paged PDF and in that process, it name the individual PDF files the same name as their page label.
I believe the two functions that we're dealing with are Doc.getPageLabel() and Doc.extractPages() but we're unsure as to how to tie this into a javascript that will do what we need it to do. Unfortunately, none of us have any experience with JS. I appreciate any and every ones help! Thank you!
Copy link to clipboard
Copied
Different file, same problem. It doesn't even try to convert the pages I have selected, it just tries to convert the entire file and fails after a random number of pages are converted. I open the PDF, go into the "organize pages" section, select the pages I want, CTL+J, enter the script:
for (var p=0; p<this.numPages; p++)
{
this.extractPages(p, p, this.path.replace(this.documentFileName, this.getPageLabel(p) + ".pdf"));
}
And it randomly spits out pages that I didn't select and errors out. I am on a local PC, with the file saved in a folder on my desktop. Security stuff is disabled in Adobe and I have full RW permission to the folder.
RaiseError: The file may be read-only, or another user may have it open. Please save the document with a different name or in a different folder.
Doc.extractPages:3:Console undefined:Exec
===> The file may be read-only, or another user may have it open. Please save the document with a different name or in a different folder.
undefined
Copy link to clipboard
Copied
See my advice from earlier.
Copy link to clipboard
Copied
I believe I have tried all of your suggestions. I'm coming to the realization that blueprints probably have something inherent in them that prevents this from working.
Copy link to clipboard
Copied
Did you add the command to print out the page number, so you could see which one is causing the error?
Copy link to clipboard
Copied
It gave me this:
TypeError: Invalid argument type.
Doc.getPageLabel:5:Console undefined:Exec
===> Parameter nPage.
undefined
If I understand that correctly page 6 of my file is causing the error? The name of that page is A101 - Overall Phasing Plan
No strange characters. In fact, I checked all of the names of the pages previously, no forward slashes, no odd characters. They're all in the format of "Plan Page Type - Name" aka "A101 - Overall Phasing Plan"
Copy link to clipboard
Copied
Can you share the file?
Copy link to clipboard
Copied
Yes, please find it here: https://acrobat.adobe.com/id/urn:aaid:sc:US:a300da13-6d14-4bd7-8976-aece2e3e50a9
Copy link to clipboard
Copied
If you would have done what I suggested before the problem would have been apparent... Here's that code:
for (var p=0; p<this.numPages; p++) {
console.println("Extracting page " + (p+1));
var pageName = this.getPageLabel(p);
console.println("Page name: " + pageName);
this.extractPages(p, p, this.path.replace(this.documentFileName, pageName + ".pdf"));
}
Copy link to clipboard
Copied
I'm sorry - I don't understand code or what any of it means. I'm just a sales guy trying to break up project plans I'm sent by my customers. I thought I entered it correctly and it gave me some result, but I don't know what I'm looking at.
Copy link to clipboard
Copied
This allows you to track where the error took place via the output in the Console (which is where you're running the code from). Namely, here:
Extracting page 46
Page name: A311 - Door/Window Details
This page's label contains a slash, which can't be used as a part of a file's name.
Since you're not a programmer you should heed the advice you're given here by those who are.
Alternatively, I'm happy to create for you tool that will do all of this with a single click, including removing the characters that are not valid for a file-name, for a small fee.
Copy link to clipboard
Copied
It's tough to heed the advice when I'm not exactly sure what to even look for - but point taken. I give my customers the same talk! I would be interested in purchasing a tool that would do that for me though, are we allowed to discuss costs here or does that need to be taken somewhere else?
Copy link to clipboard
Copied
That's more of a private discussion. Send me a PM, please.
Copy link to clipboard
Copied
Hi there! Sorry to bother you again, your code works perfectly. If possible, I have another request: would it be possible to keep the original page label within the files that have been split? For example, a PDF file containing pages 1-10, once it's split, if I open page 5, for instance, the page label is not 5, but it's 1. Is there a way to keep the original label, so in this case, the 5? Thank you so much.
Copy link to clipboard
Copied
Not directly. You will need to open each file and re-apply the pages label scheme to do it.
Copy link to clipboard
Copied
Thanks!
Copy link to clipboard
Copied
@try67 Please allow me to step in here and ask for assistance. It is not my thread, but it is more or less about the same. I have a few hundred relatively small PDF files, each with some 20 named pages. Need to extract two pages, named: Group1 and Group2
After extracting I would like to have these pages saved using the original file name with page label.
D:\SourceFiles
e.g. SourceFile123.pdf
SourceFileABC.pdf
D:\ExtractedPages
SourceFile123-group1.pdf
SourceFile123-group2.pdf
SourceFileABC-group1.pdf
SourceFileABC-group2.pdf
As these pages usually are at the 3rd and 4th page I first tried using
this.extractPages(1, 4, "/D/FirstPages/ from "+this.documentFileName+ this.getPageLabel);
(but nothing ends up in D:\FirstPages)
On some other site I found a script that would extract pages based on a specific word.
However, the pages are named with crypticnames.tmp.pdf (i.e. they are not saved using the source file name)
All in all, I can't get it to work.
Any suggestions?
Copy link to clipboard
Copied
I don't quite follow...
- Do you want to save them in D:\ExtractedPages or D:\FirstPages? Either way, you have to make sure that folder exists before running your code. It won't be created by the script.
- Where is the Group1/Group2 info coming from?
- Are you running this code in an Action? From the Console? Something else?
- Your code contains multiple errors, and I don't understand which page label you're trying to use...
Copy link to clipboard
Copied
Thanks for replying. The folder, it does not matter. WHen I started trying to get this done, I based myself on your reply, June 18, 2020 in
Solved: Re: batch processing acrobat extract first page wi... - Adobe Community - 11218438
(there is where the 'first pages' folder came from. the folder does exist on my drive)
Tried in Action Wizard
Files to be saved using their source file name and page label.
hope all clear.
the pdf's has about 20 pages, there are two pages, group1 and group2 (usually page 3+4 of the pdf)
and I want these 2 pages extracted.
Thanks.
Copy link to clipboard
Copied
Page labels and bookmarks are not the same.
Copy link to clipboard
Copied
Oops...! That I did not know to be honest.
Ahum.
It does show I am not an expert...
(was convinced they were the same)
Copy link to clipboard
Copied
No, quite different things. But if the bookmarks always have the same name and point to the same pages, then why do you need to use them anyway? Or is that not the case? If not, this will require a more complex script, especially since they are not top-level bookmarks.
Copy link to clipboard
Copied
Okay, thanks. Bad luck, so be it.
In that case I need to manually go thru close to 400 pdfs - select all pages (52) - deselect page 4 and 5 - delete - save
Thanks again.
Copy link to clipboard
Copied
Why manually? This can totally be automated...
Copy link to clipboard
Copied
I would not know how, sorry. Have given up searching.
In case doing it manually: open (a number of) file(s) - show thumbnails of the pages - ctrl-a
deselect page 4+5, delete, save, next file...
Copy link to clipboard
Copied
You can extract the two pages with extractPages.

