It's a pain in the ass how many times I label stuff with separate speakers, and it still bundles the captions together. I would love it if I could precisely separate the speakers on the transcribing process and have it be consistent once the captions are generated.