Skip to main content
Inspiring
July 15, 2024
Question

Audition - AI Speech to Text and AI Text to Speech

  • July 15, 2024
  • 2 replies
  • 3443 views

Sorry, this is a bit of a rant... but hopefully, I am expressing what others are thinking.

Remember when Adobe demonstrated an advanced Audition version with the ability to edit a recording by directly manipulating the text, both deleting text and adding new text synthesized in the same voice? It was really cool, and Adobe said they were going to release the product soon. What was that? Ten years ago? 

Now that I am very well versed in Generative AI tech and work in this field, I understand that what Adobe demonstrated was entirely scripted and fake because the technology to do exactly what they demonstrated simply did not exist. 

But I am off-topic. It's now 2024, and Audition (even the Beta version) still doesn't support speech-to-text, and even the text-to-speech generator is 1990s SAPI technology! No one uses SAPI text-to-speech any longer! 

Some might say Premiere offers speech-to-text. It sure does, and it is ridiculously bad! Have you compared it to even the oldest iteration of 2023's Whisper AI? You will never use Premiere again!

Every day, I find new reasons to dump this costly Adobe platform. I want Adobe to be much better because I have invested many years in becoming an "expert" in the products. Unfortunately, Adobe continues to disappoint.


2 replies

Known Participant
November 5, 2024

I agree.  I've been waiting for Adobe to catchup with all the other Text to Speech software and have to this point been very disappointed.  There are numerous companies that have voices that are so natural yet Adobe seems to use a "moral" reason to throw us under the bus.  Sad.  I've used Adobe Audition for 20 years and it's sad that they refuse to keep up to AI in 2024. Very disappointing.  The virtue signaling "moral" argument doesn't work.  Sorry.

Known Participant
February 27, 2025

Eleven Labs and Playht are doing some amazing things.  Sad that Adobe has gotten so far behind.  I've used their products for 30 years.  They were always cutting edge and solid.  Now however they seem only to be dabbling with AI rather than jumping in whole hog.  I wish they would say to the Audition creators -- make something better than Eleven Labs -- 100's of voices in 100's of languages.  Generative Text to Speech.  Wow!  Wouldn't that be great.  When will they catch the vision? 

SteveG_AudioMasters_
Community Expert
Community Expert
July 16, 2024
quoteRemember when Adobe demonstrated an advanced Audition version with the ability to edit a recording by directly manipulating the text, both deleting text and adding new text synthesized in the same voice? It was really cool, and Adobe said they were going to release the product soon. What was that? Ten years ago? 

Adobe didn't ever release anything like text to speech because their legal team - quite correctly - forbade them from doing so. The idea of putting words that they didn't speak into somebody's mouth looked, and still does, like a legal minefield.

 

And let's face it, they clearly don't like speech to text very much either. That said, I've never seen any system that I'd actually call good....

Inspiring
July 17, 2024

Hi SteveG... I mean no disrespect but it is almost as if you are still living in 2019. 🙂 

I whole lot has changed from that time... to start, like I said, the technology to do what they demonstrated in 2016 *absolutely did not exist* at that time. It's 2024 and anyone working in the field of LLMs and machine learning (such as myself) now understand exactly what is required in order to clone voices with that kind of accuracy they demonstrated. Adobe lied! Whatever! It's no big deal! Many companies do the same! But they did lie! Also, the legal issues were their second lie to cover up the fact that they had no product to sell.

Consider this, if Adobe couldn't release a voice editing product, why are there over 25 Generative AI companies selling voice cloning products now? Even the Apple iPhone can clone anyone's voice after some training. Others will close with only 5 minutes of good prerecorded speech. Again, the legal argument was nonsense. 

"I've never seen any system that I'd actually call good...."

With a response like that, I don't think this is a topic that interests you. Otherwise, you'd be well aware of dozens of products that can now synthesize voice that is indistinguishable from human voice, including mimicking realistic cadence and emotions. I used to spend $$$ on voice actors on already limited budget. 

Anyhow, regardless of it all, there is still no excuse that Adobe still gives us 2009 tech SAPI voices for TTS, considering the money this company takes in through subscriptions. 

Take care.

SteveG_AudioMasters_
Community Expert
Community Expert
July 17, 2024

Well, you're entitled to your view, however morally short-sighted it is.