• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Adobe's Extract API: Non-Image Elements Classified as Images

New Here ,
Mar 07, 2024 Mar 07, 2024

Copy link to clipboard

Copied

I'm currently working on implementing an automated mechanism to enable users to apply alt text to images within a PDF file.

 

Here's the algorithm I'm using:

 

  1. Utilize Adobe's autotag API to make the PDF accessible.
  2. Extract all images using Adobe's extract API.
  3. Present each extracted image to the user, allowing them to select the image they wish to apply alt text to.
  4. Apply the chosen alt text to the selected images, and generate an updated PDF with the alt text applied.

 

However, I'm encountering issues with the process:-

 

The images extracted using Adobe's extract API sometimes don't align with the images in the accessibility tags. This discrepancy is particularly noticeable when equations are mistakenly identified as images, leading to index mismatching problems. Could anyone suggest potential solutions or alternatives to address this issue? Please refer to the images attached below.

Views

316

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 07, 2024 Mar 07, 2024

Copy link to clipboard

Copied

I'll start by clarifying some terminology. Extract does not identify "images". It identifies "figures" meaning areas of the page that may be constructed from text, line art, images, or a combination of these things. This gives the developer the opportunity to replace the figure with the correct alt-text rather than the actual text within the figure area. In your second image, you wouldn't want the text "ten point zero ex ten minus 6" to be read. Instead, you'd want to hear "ten times ten to the negative 6th power".

 

Also, the Auto-tag API is dependent on Extract so you should be getting the same kind of structure from both. You can use Acrobat to add the correct alt-text to the auto-tagged PDF.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 08, 2024 Mar 08, 2024

Copy link to clipboard

Copied

Thank you very much for your response, it has been incredibly helpful in understanding this better. I do have a follow-up question regarding the consistency of structure between the Auto-tag and the Extract API, you mentioned that I should expect similar structures from both the APIs. Does this imply that the structure visible in Adobe Acrobat (accessibility tags) for a PDF tagged using the Autotag - API should align with the 'path' outlined in the structuredData.json file generated through the Extract API?

 

To provide a clearer context, I've included an example:-

 

  • In Image 1 (StructurePane.png) , we can observe a tagged PDF where the abbreviation "CEC" is labeled as a "Figure."
  • In Image 2 (structuredData_json), we have the structuredData.json file generated via the Extract API. Here, the text "CEC" is part of a paragraph, but there's no nested tagging designating "CEC" as a Figure.

 

When running the Extract API on this PDF, it yields two figures as output. However, the 'path' in the structuredData.json file for abbreviations differs from that of figures, the path for the extracted images is:  "Path": "//Document/Figure" which differs from the text abbrebriation classified as a Figure (as shown in Image 2).

 

I apologize if this question seems elementary. Your clarification on this matter would be valuable. Thank you once again.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Mar 08, 2024 Mar 08, 2024

Copy link to clipboard

Copied

That's interesting. Can you share PDF. You can send it to me privately if you don't want to post it.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Mar 11, 2024 Mar 11, 2024

Copy link to clipboard

Copied

LATEST

same problem, pdf is attached.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources