Skip to main content
Inspiring
April 16, 2018
Question

vtml in audition text to speech?

  • April 16, 2018
  • 2 replies
  • 2710 views

Just read about vtml tags that can add emphasis and pauses to make TTS sound less like a robot.

I've tried it out in Audition CC and it seems to have no effect.

Example:

<vtml_pitch value ="150>Fact</vtml_pitch>: <vtml_pause time="500"/>Employees who are auditory learners, train faster and more effectively, by listening to iSpeech text to speech, inside of e-learning courses.

Am I missing something to make this work? If the feature isn't available in Audition, what other tools would provide the service?

This topic has been closed for replies.

2 replies

Participant
October 1, 2018

I have been playing with Generate Speech in Audition CC 11.1 for the Mac. I am running 10.12.6 on a MacPro. Many of the Embedded Speech Command described in the Apple Speech Synthesis Programming Guide don’t seem to work—or I don’t know enough about the coding to get them to work. But are a few that do work:

slnc: does work well to insert silence. The format is [[slnc xxx]] where xxx represents the length of the pause in milliseconds. I find [[slnc 120]] works as a slight pause while [[slnc 300]] is good to separate sentences. I’ve gone as high as [[slnc 650]] in some dialog.

rate: is very good to slow down a few words or a single word for emphasis. [[rate xxx]] where xxx is words per minute. Here is an example: [[rate 120]] He [[rate 158 ]] is the victim. This elongates and emphasizes “He.” Once it is used it must be cancelled by a second rate command to return to the normal wpm. In this case the wpm set in the Generate Speech tab was set at 158.

volm: Initial volume setting is determined by the setting in the Generate Speech tab slider and volume without any specification is determined by that percentage. The code [[volm 0.x]] setting moves up and down from 10% to 100%. The form is [[volm 0.x]] where x is an integer between 1 and 9. It also works as 1.0. The code must have the 0 before number as in 0.7. Partial fractions such as 0.75 are not recognized. The setting 1.0 is higher than the 100% set in the Generate Speech tab. Depending on the voice a 100% setting in the tab corresponds to about [[volm 0.9.]] The volume setting is not just for the next word but stays in place until another [[volm 0.x]] command is given. I have not found that [[volm + 0.1]] or just [[volm +]] works on my machine. It is supposed to increase or decrease relative to its current value, but no go for me.

pbas: (pitch modulation) works but varies depending on the voice chosen. The form is [[pbas xxx]] where xxx can have a low of about 45 and a high of about 350. I find that a return to the normal pitch varies by voice but is generally within the 100 to 150 region. Since I an using this in Audition I find this easier to use Pitch Bender on the actual file.

emph: The format here is [[emph +/-]] but it does nothing on my computer. I have used other emphasis commands such a vol, rate, and punctuation.

punctuation: depending on the text changing the conventional punctuation often helps enormously. For example add a period in the middle of a sentence, try commas and both semi colons and colons, exclamation points and question marks.

I’d be delighted to hear: how to implement the other listed, OS X embedded [[slnc 150]] speech commands.

Kevin-Monahan
Community Manager
Community Manager
September 1, 2021

Thanks for sharing. This rare bit of info has helped me a lot in an experiement I'm trying for text to speech. 

 

Kevin

Kevin Monahan - Sr. Community and Engagement Strategist – Adobe Pro Video and Audio
francis55Author
Inspiring
April 16, 2018

Corrected to <vtml_pitch value ="150"> but still no effect

ryclark
Participating Frequently
April 16, 2018

TTS is very primitive in Audition and relies mainly on your operating system for what can be done. It uses the built in TTS provided by the OS to generate speech. So depending whether are on a Mac or a PC the functionality will be different. I don't know if inputting text into Audition's TTS will actually pass on any vtml tags to the OS speech generator.

SteveG_AudioMasters_
Community Expert
Community Expert
April 16, 2018

VTML probably won't work, but as far as I'm aware, VoiceXML does... see Speech Synthesis Markup Language (SSML) Version 1.1