Skip to main content
Participating Frequently
April 29, 2008
Question

How can I get a total page count of a PDF before placing every page in the PDF?

  • April 29, 2008
  • 17 replies
  • 35284 views
Before the [long] spiel, I'm using javascript in InDesign CS3.

I'd like to create a script that places a multiPage PDF in any number of different impositions. (saddle stitch, 4up signatures, 16up signatures etc.)

I've easily created a script that iterates through a PDF and places it into a new document as (4,1|2,3), (8,5|6,7) etc, which works for printing in duplex, folding each page in half, and gluing the resulting spines to make a simple thick book (for PDFs with more than, say, 64 pages).

However, the next step is to re-write the script to create a saddle stitch document (16,1|2,15), (14,3|4,13) ... (10,7|8,9). For this I need to know how many pages there are in the PDF before I start placing the PDF pages, and then making the new document [int((PDFpages+3)/4)] pages long.

Is there a simple way to get the count of PDFpages without going through a loop and placing the next page from the PDF until the placed page number equals the first page number?

This way seems wasteful:

var totPDFPages = 1;
app.pdfPlacePreferences.pageNumber = totPDFPages;
myPDFPage = sheetFront.place(File(srcPDF), [0,0])[0];
var pdfFirstPage = myPDFPage.pdfAttributes.pageNumber;
while (doneCounting == false) {
totPDFPages += 1;app.pdfPlacePreferences.pageNumber = totPDFPages;
myPDFPage = sheetFront.place(File(srcPDF), [0,0])[0];
if (myPDFPage.pdfAttributes.pageNumber == pdfFirstPage) {
totPDFPages -=1;
doneCounting = true;
alert("PDF has " + totPDFPages + " pages!");exit();
};
myPDFPage.remove();
};

NB. Javascript above *hasn't* been run, but should look similar once debugged.

The only thing I've though of to relieve the sheer duplication of placing the PDF twice (once for the count, and once for the imposition), is to create an array of impoPages[counter]=myPDFPage, and then shuffle the pages referenced by the array to the correct sheet and position.

It'd be much easier to be able to assign pageCount = File(srcPDF).pageCount !!!

Thanks for any help/tips or even a simple "What are you smoking, man!?"

Cheers,
Jezz

17 replies

Known Participant
July 12, 2008
Hi Olav,

You are right about a) and b).

What we all want, as far as I'v understud is to get the
PDF total number of pages WITHOUT having to place each of them,
in order to count.

Nice coding!
Known Participant
July 11, 2008
Hi Ervin,

Take a look at the PlaceMultipagePDF.vbs script that comes with InDesign CS3--it solves this problem without attempting to read the PDF, and may already do most of what you want.

The trouble with opening the PDF and using RegExp to try to get the page count is that you are not guaranteed a.) that you can open the PDF, and b.) that there will be only one instance of the page count in the PDF. The internal structure of PDFs can be quite complicated--especially if multiple PDFs have been merged.

Thanks,

Ole
Known Participant
July 11, 2008
Hi again!

Wel that seams to slow, if you have to many pages It doesn't pay.

While trying to find something for VB, insted of geting and paying for a PDF page counter DLL, searching the internet I come up with this code.

'===========================================
OpenFileDialog1.ShowDialog()

Dim FileName As String
FileName = OpenFileDialog1.FileName
Dim result As Integer

Dim fileReader As System.IO.StreamReader
fileReader = My.Computer.FileSystem.OpenTextFileReader(FileName)
Dim pdfText As String
pdfText = fileReader.ReadToEnd

Dim regx As New Regex("/Type\s*/Page[^s]")
Dim matches As System.Text.RegularExpressions.MatchCollection
matches = regx.Matches(pdfText)
result = matches.Count

MsgBox(result.ToString)
'============================================

For someone that use VB might be usefull.

Nice coding!
Known Participant
May 31, 2008
Hi all!

I am writeing a script in VB that take a pdf file and place the
pages acording to the right position and so on.

Thank you all for your scripts and ideas.

I have a question.

What if the pdf contains pages that do not have the same width or height. It hapened to me with some pdf files.

So eaven it is slow I'm placeing all the pdf pages (and counting them) and at the same time checking the width and the height to
be dhe same in every one of them.

I'm scripting for InDesign CS2

Nice Coding!
Known Participant
May 2, 2008
this will get almost all pdf pages count.

jxswm
///////////////////////////


function getPDFPageCount(f){
if(f.alias){f = f.resolve();}
if(f == null){return -1;}
if(f.hidden){f.hidden = false;}
if(f.readonly){f.readonly = false;}
f = new File(f.fsName);
f.encoding = "Binary";
if(!f.open("r","TEXT","R*ch")){return -1;}
f.seek(0, 0); var str = f.read(); f.close();
if(!str){return -1;}
//f = new File(Folder.temp+"/123.TXT");
//writeFile(f, str.toSource()); f.execute();
var ix, _ix, lim, ps;

ix = str.indexOf("/N ");
if(ix == -1){
var src = str.toSource();
_ix = src.indexOf("<< /Type /Pages /Kids [");
if(_ix == -1){
ps = src.match(/<<\/Count (\d+)\/Type\/Pages\/Kids\[/);
if(ps == null){
ps = src.match(/obj <<\\n\/Type \/Pages\\n\/Count (\d+)\\n\/Kids \[/);
if(ps == null){
ps = src.match(/obj\\n<<\\n\/Type \/Pages\\n\/Kids \[.+\]\\n\/Count (\d+)\\n\//);
if(ps == null){return -1;}
lim = parseInt(ps[1]);
if(isNaN(lim)){return -1;}
return lim;
}
lim = parseInt(ps[1]);
if(isNaN(lim)){return -1;}
return lim;
}
lim = parseInt(ps[1]);
if(isNaN(lim)){return -1;}
return lim;
}
ix = src.indexOf("] /Count ", _ix);
if(ix == -1){return -1;}
_ix = src.indexOf(">>", ix);
if(_ix == -1){return -1;}
lim = parseInt(src.substring(ix+9, _ix));
if(isNaN(lim)){return -1;}
return lim;
}
_ix = str.indexOf("/T", ix);
if(_ix == -1){
ps = str.match(/<<\/Count (\d+)\/Type\/Pages\/Kids\[/);
if(ps == null){return -1;}
lim = parseInt(ps[1]);
if(isNaN(lim)){return -1;}
return lim;
}
lim = parseInt(str.substring(ix+3, _ix));
if(isNaN(lim)){return -1;}
return lim;
}
lfcorullon13651490
Legend
June 23, 2018

Awesome and fast. Thanks for share!!!

Trevor:
Legend
June 27, 2018

For Mac the spotlight method will not work if the files are on a remote server / drive.

See https://macscripter.net/viewtopic.php?id=32381 for some methods.

For Windows, IF YOU HAVE ACROBAT INSTALLED, nowadays quite common with CC you can use this.

// For Windows only

function pagesInPDF(path) {

    if ($.os[0] !== "W") {return 'Jerk';} // For Windows only!

    var vbs;

    vbs = [

        'Set doc = CreateObject("AcroExch.PDDoc")',

        'path = "' + new File(path).fsName + '"',

        'doc.open path',

        'returnValue  = doc.GetNumPages',

        'doc.Close()'

    ].join('\n');

    // Thanks to Harbs for returnValue https://forums.adobe.com/message/8152525#8152525

    return app.doScript(vbs, ScriptLanguage.VISUAL_BASIC);

    // -1 means doc couldn't be opened

}

// Change to correct path

// pagesInPDF("C:\\Users\\Trevor\\Creative Cloud Files\\PDF\\dest.pdf")

Peter Kahrel
Community Expert
Community Expert
May 1, 2008
Well spotted! I had expected to come across PDF files that had their page count encoded in a different way, but I never have so far. I'll keep your solution in mind.

Thanks,

Peter
_Jezz_Author
Participating Frequently
May 1, 2008
this catches reading past the end of the file:

function getPDFPageCount(f) {


f.open ('r');
var gotCount = false;
while (! gotCount) {
next_line = f.readln();
if ( f.eof ) {alert("Aborting the script\nWe've got to the end of the file without finding a page count");
f.close();
exit();
}
if (next_line.indexOf ('/N ') > 0) { // We've got the easy sort of PDF
var p = next_line.match (/\/N (\d+)\/T/)[1];
alert("Found a '/N' style PDF, with "+p+" pages");
gotCount = true;
}
else if (next_line.indexOf ('/Pages>>') > 0 ) { // We probably had to read nearly to the end of the file for the match...
var p = next_line.match (/\/Count (\d+)\/K/)[1];
alert("Found a '/Count ... /Pages>>' style PDF, with "+p+" pages");
gotCount = true;
}
}
f.close ();
return Number(p);
}

_Jezz_Author
Participating Frequently
May 1, 2008
aagh!

My try{} catch{} doesn't abort the script if it doesn't find either of the matches, and appears to just keep on trying to read. (endless loop :( )

I need to fix for reading past the end of the file.

--Jezz
_Jezz_Author
Participating Frequently
May 1, 2008
Hi Peter,

I found that your method doesn't always work as it is...

The header from "InDesign CS2 Scripting Reference.pdf" looks like this:
%PDF-1.5
%
1 0 obj<</Contents 2 0 R/Type/Page/Parent 285639 0 R/Rotate 0/MediaBox[0.0 0.0 612.0 792.0]/CropBox[0.0 0.0 612.0 792.0]/BleedBox[0.0 0.0 612.0 792.0]/TrimBox[0.0 0.0 612.0 792.0]/ArtBox[0.0 0.0 612.0 792.0]/Resources<</Font<</C0_0 5899 0 R/C0_1 286131 0 R>>/ProcSet[/PDF/Text]/ExtGState<</GS0 286123 0 R>>>>/StructParents 4>>
endobj
2 0 obj<</Length 3252/Filter/FlateDecode>>stream

However, about three quarters through the PDF there appears a line:

285635 0 obj<</Count 1928/Kids[285636 0 R 285792 0 R 285948 0 R]/Type/Pages>>

It's not the only /Count object, but it *is* the only occurence of a line containing "/Pages>>", so it's not all doom and gloom. :)



totPDFPages = getPDFPageCount(File.openDialog("Choose a PDF File"));

function getPDFPageCount(f) {
f.open ('r');
var gotCount = false;
while (! gotCount) {
try {next_line = f.readln();}
catch (myError) {
alert("We've got an error '"+myError+"\pAborting the script");
exit();
}
if (next_line.indexOf ('/N ') > 0) { // We've got the easy sort of PDF
var p = next_line.match (/\/N (\d+)\/T/)[1];
alert("Found a '/N' style PDF, with "+p+" pages");
gotCount = true;
}
else if (next_line.indexOf ('/Pages>>') > 0 ) { // We probably had to read nearly to the end of the file for the match...
var p = next_line.match (/\/Count (\d+)\/K/)[1];
alert("Found a '/Count ... /Pages>>' style PDF, with "+p+" pages");
gotCount = true;
}
}
f.close ();
return Number(p);
}


--Jezz
_Jezz_Author
Participating Frequently
April 30, 2008
Superb!

I'm very comfortable with GREP in BBEdit, but really need to expand that into javascript.

That snippet of javascript to count the number of pages in a PDF is the sort of lateral thinking I love!

Once again, Peter, you're a marvel.

--Jezz