Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

How to extract alt-text from pdf programmatically?

New Here ,
Apr 08, 2024 Apr 08, 2024

I have added alt-texts to the images in a pdf using acrobat pro and have tried using all the extract apis on it, but these report no information about the alt-texts of the images in the pdf. I was wondering is there some api provided by adobe that would allow me to extract the alt-texts from the pdfs programmatically? I have attached the pdf I am testing on below.

TOPICS
Acrobat SDK and JavaScript
723
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Apr 08, 2024 Apr 08, 2024
LATEST

The "alt-text" values are stored in the document tag structure. This text is not part of the page content, or the image ojbects. In fact it's not in anything on the actual pages. It is stored at the document level. You'd think that the Adobe Extract API would be examining the tagging info (since the tags provide structure), but it doesn't seem to be the case.  It appears that it only looks at the page content. 

I don't know of any other Adobe APIs that could extract this type of tag data. Except for the plug-in SDK. So you could write a custom plug-in to do this.       

 

 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines