In the current version, the speakers can be separated during voice transcription, but the create captions is still combined together.
And you can't select one of the speakers for create captions through the filter in the transcription window.
Suggest to add the option to separate the speakers when create captions