This might sound a little confusing however, when you add automatic captions into your sequence, there is text display that is just constant black color.

as such, what I am suggest is that like how in the transcript the word that is being spoke is highlighted do the same for the captain track and highlight it into a different color.
It would also be nice if the word's were spaced based on which timestamp they were spoken on.