Copy link to clipboard
Copied
Hi guys
I am currently working on a java project that requires the implementation of a read-aloud feature for PDF documents. The PDFs I'm dealing with include images with alt-text. My goal is to extract both the text and alt-text from the PDF while maintaining the correct reading order to enable the read-aloud functionality.
Copy link to clipboard
Copied
Have you looked at/tried the Extract API? I'm not sure about alt-text for images, but in general, Extract gets _everything_ out.
Copy link to clipboard
Copied
Hi Raymond
Thanks for your response. I did use the API, i tried to set the ExtractPDFOptions but I dont see ExtractElementType.ALT (or anything equivalent). The result I got was not all the text and alt-text in the reading order.
Please give me some hints
Thanks
Copy link to clipboard
Copied
Can you share a PDF w/ images that make use of alt text? If it's private, DM me.