Copy link to clipboard
Copied
This might be impossible, but I thought I'd ask. I frequently have to edit the audio (or video with audio) of my husband speaking bilingually (45 minute Bible messages going back and forth between English and Japanese). He's a native English speaker, so that part is fine, but when he is speaking Japanese, he has a habit of mispronouncing some words that should have a long O sound (e.g. "cope") with /ɑː/ instead (e.g. "cop"; "ah" as in "father") or half-way between the two vowel sounds. Sometimes I can find another place where he said the same syllable correctly and splice it in, but the pitch is almost never the same so it sounds a bit disjointed, and sometimes I can't find a decent replacement at all and have to just live with it. The current message I'm working on has three such mistakes within less than a minute!
So I'm wondering if there is something I can tweak in the frequency spectrum or something, so that the ah sound more resembles oh. A common word he does this to is "honto" (meaning real or true), saying it more like "hanto". It's common for Americans - if you think about Lone Ranger's sidekick, his name is spelled Tonto, but everyone pronounces the first O more like ah (YouTube example). Is there a way in Audition to fix something like that?
Copy link to clipboard
Copied
I did a couple of experiments with the Sound Removal tool, and rapidly came to the conclusion that this simply isn't going to be possible by any automated means. I found some speech with 'ow' in it several times, saved one as a model and tried to detect other instances of it, but it appears to be far too small a sample size, and it effectively said that most of the file was 'ow' - and eliminated it accordingly!
So, I'm afraid you're stuck with doing this the hard way. At the risk of re-opening a can of worms, I will mention that Adobe were at one stage threatening to develop more fully some software that would indeed let you do this - but they were (quite correctly) stopped by the legal team, because it meant that you could create quite convincing versions of people saying things they'd never said before, and which could have been complete lies. The consequences of this were, and still are, pretty unthinkable and the legal team got the heebie-jeebies. So it ain't happening - under any circumstances.
Copy link to clipboard
Copied
SteveG(AudioMasters) is right this is a very, very difficult effect to achieve with current tech right now.
If you're able to get your husband to re-record a mistake after the event this could be edited in. I find the best way is to speak the whole sentence with the mistake in it and then edit at the start and end halfway through a word for a more seamless edit.
If it's video as well this can be even tougher but you can play with a tool such as Automatic Speech Alignment in the multitrack.
I wish there was more news on Adobe Voco - it looked awesome - but the legal implications are super scary for sure. You may want to take a look at Lyrebird. You can clone a human voice and it is really accurate although the clones still sound a little robotic to me.
Copy link to clipboard
Copied
Thanks, Steve and Mike. Yeah, I can totally imagine that Adobe Voco would be scary legally and socially. It's bad enough that people take virtual scissors to things other people say. In fact, when I looked at Lyrebird's website, I noticed that their two examples of clone voices were Trump and Obama - what a way to get people thinking about deception!
SteveG(AudioMasters): I'm not looking for automatic detection. One needs to understand what he's trying to say in order to recognize the mistakes - I can do that. I just wish I could manually tweak it when I come across it.
Mike: It's difficult to have him record a correction afterwards because it's a church message - after he is done, the room is not quiet for another three hours or more, and by then he and I have both gone home (he is not the pastor but gives the message once a month). Plus, I don't always notice his mistakes at the time, but catch them when I'm editing the video. And yeah, this is usually video, which adds to the challenge. Automatic Speech Alignment might help with certain situations - thanks for the tip.
Lyrebird wouldn't be any better than having him re-record a word here at home (where the mic and room ambience would not be the same), and since it doesn't know Japanese, it might not be possible to find an appropriate English sentence with the syllable I want rendered with pitch/inflection that matches the context. But it's fascinating to learn that such a service exists!
Copy link to clipboard
Copied
Hi there,
I find this query quite interesting and useful. Honestly, I'm unsure if this is possible with Adobe Audition. I've raised this question and shared the thread with our development team and other experts.
I'll update you as soon as I get any response from them.
Thanks,
Shivangi
Copy link to clipboard
Copied
Honestly, I'm unsure if this is possible with Adobe Audition.
Honestly, I'm absolutely sure it isn't! For a start, this would require a phoneme change - and there is absolutely no facility for that whatsoever.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now