Skip to main content
Inspiring
August 1, 2024
Question

PDF Extract API: Image Alternate Text

  • August 1, 2024
  • 1 reply
  • 678 views

Hi all,

 

if I am correct, there is currently no way to get the ALT text of an image via the PDF Extract API? You get the content, the images and the path of the object in the structure tree.

 

Does anyone know if this is planned for the future?

 

Thanks

Roland

This topic has been closed for replies.

1 reply

Participant
August 26, 2024

I'm also interested in this and I can't seem to find a way to do it - is it possble?

 

Thanks,
Niall

Inspiring
August 26, 2024

With PDFjs from Mozilla you can get the ALT texts: Page.getStructTree()

 

https://github.com/mozilla/pdf.js

 

But: The paths that PDF Extract API provides correspond to the reading order in the PDF. With the getStructTree method from PDFjs, however, you get the tag structure tree in the PDF. The order does not have to match.

 

Roland