PDF Extract API: Image Alternate Text
Copy link to clipboard
Copied
Hi all,
if I am correct, there is currently no way to get the ALT text of an image via the PDF Extract API? You get the content, the images and the path of the object in the structure tree.
Does anyone know if this is planned for the future?
Thanks
Roland
Copy link to clipboard
Copied
I'm also interested in this and I can't seem to find a way to do it - is it possble?
Thanks,
Niall
Copy link to clipboard
Copied
With PDFjs from Mozilla you can get the ALT texts: Page.getStructTree()
https://github.com/mozilla/pdf.js
But: The paths that PDF Extract API provides correspond to the reading order in the PDF. With the getStructTree method from PDFjs, however, you get the tag structure tree in the PDF. The order does not have to match.
Roland

