vtml in audition text to speech?

Question

Just read about vtml tags that can add emphasis and pauses to make TTS sound less like a robot.

I've tried it out in Audition CC and it seems to have no effect.

Example:

<vtml_pitch value ="150>Fact</vtml_pitch>: <vtml_pause time="500"/>Employees who are auditory learners, train faster and more effectively, by listening to iSpeech text to speech, inside of e-learning courses.

Am I missing something to make this work? If the feature isn't available in Audition, what other tools would provide the service?

midijon · Answer

I have been playing with Generate Speech in Audition CC 11.1 for the Mac. I am running 10.12.6 on a MacPro. Many of the Embedded Speech Command described in the Apple Speech Synthesis Programming Guide don’t seem to work—or I don’t know enough about the coding to get them to work. But are a few that do work:

slnc: does work well to insert silence. The format is [[slnc xxx]] where xxx represents the length of the pause in milliseconds. I find [[slnc 120]] works as a slight pause while [[slnc 300]] is good to separate sentences. I’ve gone as high as [[slnc 650]] in some dialog.

rate: is very good to slow down a few words or a single word for emphasis. [[rate xxx]] where xxx is words per minute. Here is an example: [[rate 120]] He [[rate 158 ]] is the victim. This elongates and emphasizes “He.” Once it is used it must be cancelled by a second rate command to return to the normal wpm. In this case the wpm set in the Generate Speech tab was set at 158.

volm: Initial volume setting is determined by the setting in the Generate Speech tab slider and volume without any specification is determined by that percentage. The code [[volm 0.x]] setting moves up and down from 10% to 100%. The form is [[volm 0.x]] where x is an integer between 1 and 9. It also works as 1.0. The code must have the 0 before number as in 0.7. Partial fractions such as 0.75 are not recognized. The setting 1.0 is higher than the 100% set in the Generate Speech tab. Depending on the voice a 100% setting in the tab corresponds to about [[volm 0.9.]] The volume setting is not just for the next word but stays in place until another [[volm 0.x]] command is given. I have not found that [[volm + 0.1]] or just [[volm +]] works on my machine. It is supposed to increase or decrease relative to its current value, but no go for me.

pbas: (pitch modulation) works but varies depending on the voice chosen. The form is [[pbas xxx]] where xxx can have a low of about 45 and a high of about 350. I find that a return to the normal pitch varies by voice but is generally within the 100 to 150 region. Since I an using this in Audition I find this easier to use Pitch Bender on the actual file.

emph: The format here is [[emph +/-]] but it does nothing on my computer. I have used other emphasis commands such a vol, rate, and punctuation.

punctuation: depending on the text changing the conventional punctuation often helps enormously. For example add a period in the middle of a sentence, try commas and both semi colons and colons, exclamation points and question marks.

I’d be delighted to hear: how to implement the other listed, OS X embedded [[slnc 150]] speech commands.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded