• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Inconsistent bounding box results when mapping Adobe PDF Extract API results to PDF images

New Here ,
Apr 07, 2023 Apr 07, 2023

Copy link to clipboard

Copied

Issue:

I'm currently working on a project where I need to obtain bounding boxes for different components in a PDF, such as images, tables, and text. To do this, I'm using the "Bounds" and "ClipBounds" attributes for all elements, as well as the "BBox" attribute for images and tables. My goal is to map these coordinates to pixel format because I need to use them on PDF pages that have been converted to images. To achieve this, I'm using the following normalization code:

, y, w, h = int(x*img.size[0]/width), int(y*img.size[1]/height), int(w*img.size[0]/width), int(h*img.size[1]/height)

where img.size is the size of the PDF page converted to an image and width and height are the page dimensions according to the API output.

Actual Behaviour

This technique works for some PDFs, but it doesn't work for others. In some cases, I get neat bounding boxes using both "Bounds" and "BBox", while in other cases, I only get correct results using "Bounds" and not "BBox". There are also instances where both "Bounds" and "BBox" give bad results.

Expected Behaviour

I'm looking for a consistent way to map the API results to the images of PDF pages, regardless of the PDF file. Ideally, I want to obtain accurate bounding boxes for all components using a single technique.

 

Any help would be really appreciated. Thank you!

 

I have attached some examples here -

download.pngoutput_1.jpg

Views

323

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines

correct answers 1 Correct answer

New Here , Nov 13, 2023 Nov 13, 2023

any solution to this?

Votes

Translate

Translate
New Here ,
Apr 07, 2023 Apr 07, 2023

Copy link to clipboard

Copied

This is the normalization code -

x, y, w, h = int(x*img.size[0]/width), int(y*img.size[1]/height), int(w*img.size[0]/width), int(h*img.size[1]/height)

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 13, 2023 Nov 13, 2023

Copy link to clipboard

Copied

LATEST

any solution to this?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources