• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Relating the co-ordinates in bounds in JSON output with actual location in PDF

Explorer ,
Sep 25, 2021 Sep 25, 2021

Copy link to clipboard

Copied

The text Find object locations is at 181px from left in the PDF, however, the JSON output from the PDF Extract API returns this: 

{
      "Bounds": [
        108.02000427246094,
        692.2299957275391,
        246.02609252929688,
        708.1900024414062
      ],
      "ClipBounds": [
        108.02000427246094,
        692.2299957275391,
        246.02609252929688,
        708.1900024414062
      ]
}

From what I understand, 108 is bottom left location of text.

However, as per the PDF it is 180px.

Can you help understand the relation here?

 

Note: Input PDF has been attached.

 

Thanks!

TOPICS
PDF Extract API

Views

1.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 25, 2021 Sep 25, 2021

Copy link to clipboard

Copied

Here is the SS from the PDF viewer. Zoom is at 100%

NikhilRanka_0-1632571987239.png

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 25, 2021 Sep 25, 2021

Copy link to clipboard

Copied

Pixels are not a unit used in PDF. Screen size is irrelevant to the internals. Refer to the PDF Reference for page units, most likely it is 1/72 inch, origin the media box (not always the visible corner).

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 25, 2021 Sep 25, 2021

Copy link to clipboard

Copied

Thanks for sharing the info @Test Screen Name. Would be a great time saver.

 

Since the media box is not always the visible corner, any other approach to translate the co-ordinates for republishing? The math appears simple but since the media box is not visible, it gets tricky.

 

Also, is the media box different for different PDFs?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 25, 2021 Sep 25, 2021

Copy link to clipboard

Copied

The crop box should give you the visible rectangle. In the absence of a crop box, the media box is used. The media box is distinct in each PDF, but that doesn't matter, if you can find what it is. A media box can (but rarely does) start several inches from the coordinate origin. Study of the PDF reference may give you more insight.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 26, 2021 Sep 26, 2021

Copy link to clipboard

Copied

@Test Screen Name 

 

Given that the PDF services API aims towards enabling republishing of PDF documents, assuming that the API output should contain this information. Tried searching for the same, however, could not find it. Noticed that there is BBox for some elements but co-ordinates are similar to what is available in Bounds. And that seems incorrect, because the location in the PDF is different.

 

Of course, reading the PDF spec is an option. However, at this moment, some info from the API docs or, Adobe team on calculating the position that is similar to position when the PDF is viewed would help.

 

Thanks!

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Sep 27, 2021 Sep 27, 2021

Copy link to clipboard

Copied

I agree the documentation should fully describe every key and value in the exported JSON -and- how it relates to the PDF specification in each case. Frankly, I could not use this API for any real purpose without that info, because I could not base a commercial product on guesswork.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Explorer ,
Sep 27, 2021 Sep 27, 2021

Copy link to clipboard

Copied

LATEST

Yes @Test Screen Name, not certain on how republishing is cited as a use-case for Extract API. Are we missing something here? Found some explanation for JSON keys here. However, unsure on how to convert it to pixels to building an HTML from the PDF.

 

Thanks!

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources