Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

PDF Extract API: Image Alternate Text

Contributor ,
Aug 01, 2024 Aug 01, 2024

Hi all,

 

if I am correct, there is currently no way to get the ALT text of an image via the PDF Extract API? You get the content, the images and the path of the object in the structure tree.

 

Does anyone know if this is planned for the future?

 

Thanks

Roland

TOPICS
PDF Extract API
405
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Aug 26, 2024 Aug 26, 2024

I'm also interested in this and I can't seem to find a way to do it - is it possble?

 

Thanks,
Niall

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Contributor ,
Aug 26, 2024 Aug 26, 2024
LATEST

With PDFjs from Mozilla you can get the ALT texts: Page.getStructTree()

 

https://github.com/mozilla/pdf.js

 

But: The paths that PDF Extract API provides correspond to the reading order in the PDF. With the getStructTree method from PDFjs, however, you get the tag structure tree in the PDF. The order does not have to match.

 

Roland

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources