Copy link to clipboard
Copied
The text Find object locations is at 181px from left in the PDF, however, the JSON output from the PDF Extract API returns this:
{
"Bounds": [
108.02000427246094,
692.2299957275391,
246.02609252929688,
708.1900024414062
],
"ClipBounds": [
108.02000427246094,
692.2299957275391,
246.02609252929688,
708.1900024414062
]
}
From what I understand, 108 is bottom left location of text.
However, as per the PDF it is 180px.
Can you help understand the relation here?
Note: Input PDF has been attached.
Thanks!
Copy link to clipboard
Copied
Here is the SS from the PDF viewer. Zoom is at 100%
Copy link to clipboard
Copied
Pixels are not a unit used in PDF. Screen size is irrelevant to the internals. Refer to the PDF Reference for page units, most likely it is 1/72 inch, origin the media box (not always the visible corner).
Copy link to clipboard
Copied
Thanks for sharing the info @Test Screen Name. Would be a great time saver.
Since the media box is not always the visible corner, any other approach to translate the co-ordinates for republishing? The math appears simple but since the media box is not visible, it gets tricky.
Also, is the media box different for different PDFs?
Copy link to clipboard
Copied
The crop box should give you the visible rectangle. In the absence of a crop box, the media box is used. The media box is distinct in each PDF, but that doesn't matter, if you can find what it is. A media box can (but rarely does) start several inches from the coordinate origin. Study of the PDF reference may give you more insight.
Copy link to clipboard
Copied
Given that the PDF services API aims towards enabling republishing of PDF documents, assuming that the API output should contain this information. Tried searching for the same, however, could not find it. Noticed that there is BBox for some elements but co-ordinates are similar to what is available in Bounds. And that seems incorrect, because the location in the PDF is different.
Of course, reading the PDF spec is an option. However, at this moment, some info from the API docs or, Adobe team on calculating the position that is similar to position when the PDF is viewed would help.
Thanks!
Copy link to clipboard
Copied
I agree the documentation should fully describe every key and value in the exported JSON -and- how it relates to the PDF specification in each case. Frankly, I could not use this API for any real purpose without that info, because I could not base a commercial product on guesswork.
Copy link to clipboard
Copied
Yes @Test Screen Name, not certain on how republishing is cited as a use-case for Extract API. Are we missing something here? Found some explanation for JSON keys here. However, unsure on how to convert it to pixels to building an HTML from the PDF.
Thanks!