Skip to main content
Nikhil Ranka
Known Participant
September 25, 2021
Question

Relating the co-ordinates in bounds in JSON output with actual location in PDF

  • September 25, 2021
  • 2 replies
  • 1693 views

The text Find object locations is at 181px from left in the PDF, however, the JSON output from the PDF Extract API returns this: 

{
      "Bounds": [
        108.02000427246094,
        692.2299957275391,
        246.02609252929688,
        708.1900024414062
      ],
      "ClipBounds": [
        108.02000427246094,
        692.2299957275391,
        246.02609252929688,
        708.1900024414062
      ]
}

From what I understand, 108 is bottom left location of text.

However, as per the PDF it is 180px.

Can you help understand the relation here?

 

Note: Input PDF has been attached.

 

Thanks!

This topic has been closed for replies.

2 replies

Legend
September 25, 2021

Pixels are not a unit used in PDF. Screen size is irrelevant to the internals. Refer to the PDF Reference for page units, most likely it is 1/72 inch, origin the media box (not always the visible corner).

Nikhil Ranka
Known Participant
September 25, 2021

Thanks for sharing the info @Test Screen Name. Would be a great time saver.

 

Since the media box is not always the visible corner, any other approach to translate the co-ordinates for republishing? The math appears simple but since the media box is not visible, it gets tricky.

 

Also, is the media box different for different PDFs?

Legend
September 25, 2021

The crop box should give you the visible rectangle. In the absence of a crop box, the media box is used. The media box is distinct in each PDF, but that doesn't matter, if you can find what it is. A media box can (but rarely does) start several inches from the coordinate origin. Study of the PDF reference may give you more insight.

Nikhil Ranka
Known Participant
September 25, 2021

Here is the SS from the PDF viewer. Zoom is at 100%