Skip to main content
December 18, 2025
Open for Voting

Feedback on Speech Quality in AI Tools

  • December 18, 2025
  • 1 reply
  • 154 views

I’ve noticed that the speaking quality of Text-to-Voice-over is better than that of Text-to-Avatar. The pronunciation in Dutch sounds clearer, more natural, and aligns better with how the language is supposed to be spoken. Could this be because Text-to-Voice-over uses ElevenLabs, while Text-to-Avatar does not?
If so, is it possible for Text-to-Avatar to also be powered by ElevenLabs?

1 reply

Inspiring
December 18, 2025

Hi @27796545,

Thanks so much for taking the time to share this. The level of detail about Dutch pronunciation is really helpful for us.​

You’re right that the current Text-to-Voice-over and Text-to-Avatar features use different speech setups, and that can absolutely lead to the kind of quality difference you’re hearing in Dutch. Text-to-Voice-over is designed first and foremost for natural-sounding narration, while Text-to-Avatar also has to stay tightly in sync with the character’s mouth movements, which can affect how the speech engine is tuned.​

Your idea of having Text-to-Avatar use the same technology as Text-to-Voice-over (or offer those voices as an option) is a great suggestion, especially for languages like Dutch, where clarity and natural rhythm matter a lot. This has been shared as feature feedback for the Firefly team so they can consider closer alignment between the two tools in future updates.​

In the meantime, if you need the very best Dutch pronunciation today, using Text-to-Voice-over for the audio and then pairing it with your visuals will usually give you the most natural result. And if you have any specific Dutch phrases or cases where it sounds particularly off in Text-to-Avatar, those examples are incredibly valuable for the team when they work on improvements.​

^Sam