Skip to main content
New Participant
June 30, 2023
Question

Retrieve text and image alt-text for read-aloud feature

  • June 30, 2023
  • 1 reply
  • 587 views

Hi guys

I am currently working on a java project that requires the implementation of a read-aloud feature for PDF documents. The PDFs I'm dealing with include images with alt-text. My goal is to extract both the text and alt-text from the PDF while maintaining the correct reading order to enable the read-aloud functionality.

 
To accomplish this, I would appreciate your guidance on the following:
1. Extracting the text from the PDF while preserving the reading order that I've set using Adobe Acrobat.
2. Extracting the alt-text associated with the images in the PDF, also following the correct reading order.
3. Combining the extracted text and alt-text in the right order, which has been set using accessibily, to generate the content for a text-to-speech system.
 
I am using pdfservice-sdk
Regards
    This topic has been closed for replies.

    1 reply

    Raymond Camden
    Community Manager
    Community Manager
    June 30, 2023

    Have you looked at/tried the Extract API? I'm not sure about alt-text for images, but in general, Extract gets _everything_ out.

    New Participant
    July 5, 2023

    Hi Raymond

    Thanks for your response. I did use the API, i tried to set the ExtractPDFOptions but I dont see ExtractElementType.ALT (or anything equivalent). The result I got was not all the text and alt-text in the reading order.

    Please give me some hints

    Thanks

     

    Raymond Camden
    Community Manager
    Community Manager
    July 10, 2023

    Can you share a PDF w/ images that make use of alt text? If it's private, DM me.