How to align multiple audio clips with same number of segments
I have 3 audio clips with exact same speech generated by azure text to speech service. As 3 audio clips are using different tone, there are differences in the time finishing same sentences and words. Even I adjust the azure parameter and produce an almost exact total length, each individual sentence still have slightly different speed and finish time. As the clip becomes longer, the shift difference accumulates and becomes obvious at the later part.
The below audio clips are edited by detecting and cutting out the silent part, each audio clip has exact number of segment.

What I want to do is align each segment at the beginning, even each sentence has slightly difference speed and ending time, the difference is acceptable to me since the shift problem is not accumulating anymore. so like the below image, I can align manually, but there are hundreds of clips. is there a method to this automatically?

the azure service doesn't have any parameter to make this result, so the remaining choice would be adjusting it in adobe audition.
