• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
2

Text Base Editing Multiple Speakers

Community Beginner ,
Jun 08, 2023 Jun 08, 2023

Copy link to clipboard

Copied

So I'm using the text-based editing AI, which needs improvement for multiple speakers. I'm working on a documentary and it keeps putting two speakers under one speaker category. It's also not really detecting the different speakers I have 6 other people it put them down as only 2 speakers.

 

If anyone has any solution for the text-based AI grouping multiple speakers under one category let me know. 

Idea No status
TOPICS
Effects

Views

145

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
2 Comments
Community Expert ,
Jun 09, 2023 Jun 09, 2023

Copy link to clipboard

Copied

SockBox,

 

Is this a video/audio file with one stereo track?

 

Can you provide a 2-3 minute sample where this happens?

 

@TeresaDemel @Kevin-Monahan 

 

Stan

 

Votes

Translate

Translate

Report

Report
Community Beginner ,
Dec 31, 2023 Dec 31, 2023

Copy link to clipboard

Copied

LATEST

Ya, I'm trying to use it for long audio tracks with 4 speakers, and every try so far, Premiere has transcribed it with only 2 different speakers. I can assure you the voices sound quite different. This same problem is why I stopped using DeScript for transcriptions - they also often get the number of speakers wrong. At which point, the time-savings were badly diminished.

 

Now, Descript DOES have an option to let the user help label the speakers, by giving you a bunch of voice samples and having you label each one. That's a great idea, except their program chooses those samples automatically... and is TERRIBLE at it; often their set of samples misses one or more speakers entirely, while including brilliant clips such as a person eating chips during a break (no dialogue), or the old "20 seconds of silence and then the door opening". Now, their process would work great IF the user could manually select their own clips (which we can't in Descript), and if you could include one of more speakers from a previous file so the program has additional data, to get better and better each time. Recognizing repeat speakers would be useful in many common scenarios, such as identifying the 4 regulars on a podcast, or identifying an interviewer who is the same in every source clip (and therefore more easily knowing which lines come from a guest, who may be a new voice each time).

 

But Adobe could blow the competition out of the water if you just incorporate those features.

A) [bare minimum] Let the user (optionally) provide their own short clips of each speaker to help 'prime' the software to recognize those voices. This could also let the user test on a smaller subsection to get all the labels right before transcribing the whole thing only to possibly misidentify or completely miss speakers (as it does now)

 

and ideally

B) [shoot for the moon] have the option to save data from one or more recurring speakers so that each user's transcription gets better over time

Votes

Translate

Translate

Report

Report