Copy link to clipboard
Copied
I have added alt-texts to the images in a pdf using acrobat pro and have tried using all the extract apis on it, but these report no information about the alt-texts of the images in the pdf. I was wondering is there some api provided by adobe that would allow me to extract the alt-texts from the pdfs programmatically? I have attached the pdf I am testing on below.
Copy link to clipboard
Copied
The "alt-text" values are stored in the document tag structure. This text is not part of the page content, or the image ojbects. In fact it's not in anything on the actual pages. It is stored at the document level. You'd think that the Adobe Extract API would be examining the tagging info (since the tags provide structure), but it doesn't seem to be the case. It appears that it only looks at the page content.
I don't know of any other Adobe APIs that could extract this type of tag data. Except for the plug-in SDK. So you could write a custom plug-in to do this.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now