Skip to main content
gopaljay78
Known Participant
March 7, 2011
Question

How to find x,y coordinates of objects in PDF document

  • March 7, 2011
  • 1 reply
  • 50863 views

Hi,

Can any one help me how to find x,y coordinates of an object in the PDF document using Javascript.

The object could be text or image or color panel. I want to get the x,y position of that object.

Thanks,

Gopal

This topic has been closed for replies.

1 reply

Inspiring
March 7, 2011

How much work are you prepared to do in Javascript?

InDesign does not give access to the internal objects in a placed PDF. So, if you really really want to use the combination of InDesign and Javascript, you must:

1. Get hold of the PDF specifications. Don't worry, they're free.

2. Use Javascript to read your PDF file and parse it into recognizable objects, which means

  2a. you have to use binary read functions (as per PDF specification)

  2b. you have to write a decompression library -- more than one, by the way, as PDF supports several different kinds of compression, and Javascript supports none.

  2c. then you have to implement the PDF coordinate system, which is heavily based upon Postscript-style matrix operations, and supports several independent "layers" of transformations.

Oh, and since you ask about text:

3. You have to write a complete font system in Javascript (again, PDF supports several different kinds of font formats, and you'll have to implement all of them).

Somehow I doubt it's worth all this trouble. Can't you just open the PDF in Illustrator?

John Hawkinson
Inspiring
March 8, 2011

There are PDFs that Illustrator doesn't quite work so well on (not counting the large class of bitmapped  PDFs that it doesn't work at all on).

A reasonable compromise would be to use some tool that will regurgitate information about the objects on a page in a PDF, and to call that tool from JavaScript. (Of course, to do so you must indirect through Applescript or Visual Basic, as appropriate for Win/Mac). There are several such command-line tools. I think the last time I had a similar application I used pdfminer, a tool written in Python; but my application was somewhat specialized, there are probably other tools that might work better for this case.

Ian Proudfoot
Brainiac
March 8, 2011

A good way to get this sort of information out of a PDF would be to use Adobe's own PDFXML format (was Mars). This gives an archive that presents each page as a separate SVG file. It's much easier than trying to work with a PDF binary file.

However it's all gone a bit quiet on the Adobe Labs Mars pages with no update for Acrobat X... Perhaps it's just a dead-end.

Ian