Read Along capability within JS Embed API

Report · Feb 01, 2021

Hi,

I would like to add the read along capability within PDFs embeded on my website just like how we can highlight parts of text (word/sentence) in a reading passage that is being played in an audio file. It would be similar to the read along capability within the acrobat software, but with a custom audio file that I already have generated. Is this possible?

Report · Feb 02, 2021

It's certainly possible to communicate the page position to an audio player. Can you elaborate on what you want the user experience to be? Do you have the time code for each paragraph, heading, etc?

Report · Feb 02, 2021

Yes, I have the timestamps for each sentence and also each word. The flow we have currently is that we have a piece of text that has the corresponding audio file with the aforementioned speech marks. When the user clicks on a button, the audio starts playing and the respective word and sentence is highlighted based on the timepstamps. Is the same possible within a pdf?

Report · Feb 02, 2021

Yes. However, you will also need to know the bounding box of each word. With the combination of the word bounding boxes in order and the time codes for each word in the audio file, you'll be able to use Embed API highlight callbacks to sequentially highlight the words as they are spoken.

You can use Adobe Acrobat JavaScript to create the word/box list.

Report · Feb 02, 2021

Thanks for your reply. I'm very new to this and I've been searching around
for a solution for quite a few days, but to no avail. Can you please help
me with how I would be able to get the bounding boxes for the words and
tying that with the timestamps. Can this be done dynamically on the page or
would this data need to prepared before hand?

Report · Feb 02, 2021

Yes - You'd get the bounding boxes ahead of time and then build a sort of database of timecodes, words, and rectangles.

What you are looking to do is non-trivial. Do you have a developer that can help you?

Report · Feb 02, 2021

I do, but they don't have any experience working on something like this. But if we can understand the following, that would would really help us,

1. How can we extract the bounding boxes for each word and sentence from a pdf ahead of time and store it? Is there a tool or API that we can use?

2. From some of the pdf highlighting api sample we've seen, we haven't come across any that has the time code components. Are there any samples we could look at?

We have some understanding on how to write the code after we have the list of highlights. Any help on the above would be great.

Adobe Community

Read Along capability within JS Embed API