PDF Extract API: Image Alternate Text

Report · Aug 01, 2024

Hi all,

if I am correct, there is currently no way to get the ALT text of an image via the PDF Extract API? You get the content, the images and the path of the object in the structure tree.

Does anyone know if this is planned for the future?

Thanks

Roland

Report · Aug 26, 2024

I'm also interested in this and I can't seem to find a way to do it - is it possble?

Thanks,
Niall

Report · Aug 26, 2024

With PDFjs from Mozilla you can get the ALT texts: Page.getStructTree()

https://github.com/mozilla/pdf.js

But: The paths that PDF Extract API provides correspond to the reading order in the PDF. With the getStructTree method from PDFjs, however, you get the tag structure tree in the PDF. The order does not have to match.

Roland

PDF Extract API: Image Alternate Text

Photos