Question
Adobe's Extract API: Non-Image Elements Classified as Images
- March 7, 2024
- 1 reply
- 2123 views
I'm currently working on implementing an automated mechanism to enable users to apply alt text to images within a PDF file.
Here's the algorithm I'm using:
- Utilize Adobe's autotag API to make the PDF accessible.
- Extract all images using Adobe's extract API.
- Present each extracted image to the user, allowing them to select the image they wish to apply alt text to.
- Apply the chosen alt text to the selected images, and generate an updated PDF with the alt text applied.
However, I'm encountering issues with the process:-
The images extracted using Adobe's extract API sometimes don't align with the images in the accessibility tags. This discrepancy is particularly noticeable when equations are mistakenly identified as images, leading to index mismatching problems. Could anyone suggest potential solutions or alternatives to address this issue? Please refer to the images attached below.
