Get better lip sync with improved Adobe Sensei machine-learning technology. Use a transcript to produce a more accurate result. Using a transcript to improve computed lip sync Open Character Animator (Beta). Create a scene from an example puppet on the Home screen (e.g., Chloe (Photoshop)) or open a scene containing one of your puppets. Choose File > Import, then select the Toothsome Meme.wav audio file (from the Toothsome Meme.zip archive) to import it into the project. With the audio selected in the Project panel, the Properties panel shows the Transcript text area where you can import or type in the text matching the spoken words and phrases in the audio. For this example, click Import in the Properties panel, then select the Toothsome Meme.txt text file (from the Toothsome Meme.zip archive). The audio file’s icon in the Project panel changes to to indicate it has a text transcript associated with it. The Type column in the Project panel shows Audio+Transcript for this file. Drag the Toothsome Meme.wav file from the Project panel into the Timeline panel to add it to the scene. Select the puppet track in the Timeline panel, hold down the Shift key as you select the audio track so that both tracks are selected, then choose the Timeline > Compute Lip Sync Take from Scene Audio and Transcript menu command. Character Animator analyzes the audio and, using the associated transcript text, should produce more accurate visemes for the Lip Sync take than if no transcript was used. If you need to make corrections to the transcript, update the text in the Transcript text area, and then choose the Compute Lip Sync Take from Scene Audio and Transcript command again. Troubleshooting If transcript-based lip sync fails: Check your transcript for typos, missing words, or other mismatch errors. Add timecodes to the transcript to allow the process to skip over sections with errors. For example, see the Toothsome Meme.srt file (from the Toothsome Meme.zip archive). You can then run standard audio-only lip sync to fill in the gaps. You can type the timecodes manually or use a transcription program to generate an SRT file (.srt extension) with timecodes. For an .srt file, change its extension to .txt to select it for import, or copy and paste the text directly into the Transcript text area in the Properties panel. Splice your audio file and transcript into shorter clips. This essentially does the same thing as the timecode approach above, allowing the process to fail for a limited section. You can either: Splice the file in an audio editing program, import the tracks as separate files, and then import or paste your matching transcript sections in the Properties panel. Import multiple copies of the same audio file and trim each one within the Character Animator scene. You need to import multiple audio files because the transcripts are linked at the file level, but your transcript text in the Properties panel should match the trimmed track (not the entire audio file). Known issues and limitations Transcript-based Lip Sync is still in development. In this first public Beta version (v22.0.0.31), please note the following: For audio files longer than about two minutes, add timecode to the transcript at least every two minutes or so, or use an SRT file. Currently, only English is supported, though if another language was transcribed into text made up of words with typical English phonetics (even if they are not really words), you might be able to get reasonable results. Avoid abbreviations and spell out symbols and acronyms if they are going to be spoken in the audio file. For example: dollar ($), four point five (4.5), Graphics (GFX). There is a per-audio-segment progress bar, but it only shows progress for the process of resampling the audio, not for the actual phoneme alignment processing step, so lip sync computation might appear stuck for a bit at the end, particularly with clips in the 2 to 3 minute range. While the transcript is associated with audio clips, the processing is performed on the rendered scene audio, so if audio overlaps clips being processed, they might interfere with getting good phoneme alignment results. Transcript text files must use UTF-8 encoding. Currently, Lip Sync preferences, and specifically the Viseme Detection setting, are not supported for transcript-based lip sync. What we want to know We want to hear about your experience with Transcript-based Lip Sync: What are your overall impressions? Are you able to get more accurate lip sync results with a transcript? Are there specific words or phrases that the computed lip sync is failing on? How can we improve Transcript-based Lip Sync? Also, we’d love to see what you create with Transcript-based Lip Sync. Share your animations on social media with the #CharacterAnimator hashtag. Thank you! We’re looking forward to your feedback. (Use this Beta forum thread to discuss Transcript-based LIp Sync and share your feedback with the Character Animator team and other Beta users. If you encounter a bug, let us know by posting a reply here or choosing Report a bug from the Provide feedback icon in the top-right corner of the app.)

Adobe Employee

Question

Feature Focus: Transcript-based Lip Sync

Forum|Forum|4 years ago
July 20, 2021
6 replies
10674 views

Get better lip sync with improved Adobe Sensei machine-learning technology. Use a transcript to produce a more accurate result.

Using a transcript to improve computed lip sync

Open Character Animator (Beta).
Create a scene from an example puppet on the Home screen (e.g., Chloe (Photoshop)) or open a scene containing one of your puppets.
Choose File > Import, then select the Toothsome Meme.wav audio file (from the Toothsome Meme.zip archive) to import it into the project.
With the audio selected in the Project panel, the Properties panel shows the Transcript text area where you can import or type in the text matching the spoken words and phrases in the audio. For this example, click Import in the Properties panel, then select the Toothsome Meme.txt text file (from the Toothsome Meme.zip archive).
The audio file’s icon in the Project panel changes to to indicate it has a text transcript associated with it. The Type column in the Project panel shows Audio+Transcript for this file.
Drag the Toothsome Meme.wav file from the Project panel into the Timeline panel to add it to the scene.
Select the puppet track in the Timeline panel, hold down the Shift key as you select the audio track so that both tracks are selected, then choose the Timeline > Compute Lip Sync Take from Scene Audio and Transcript menu command.

Character Animator analyzes the audio and, using the associated transcript text, should produce more accurate visemes for the Lip Sync take than if no transcript was used.

If you need to make corrections to the transcript, update the text in the Transcript text area, and then choose the Compute Lip Sync Take from Scene Audio and Transcript command again.

Troubleshooting

If transcript-based lip sync fails:

Check your transcript for typos, missing words, or other mismatch errors.
Add timecodes to the transcript to allow the process to skip over sections with errors. For example, see the Toothsome Meme.srt file (from the Toothsome Meme.zip archive). You can then run standard audio-only lip sync to fill in the gaps.

You can type the timecodes manually or use a transcription program to generate an SRT file (.srt extension) with timecodes. For an .srt file, change its extension to .txt to select it for import, or copy and paste the text directly into the Transcript text area in the Properties panel.
Splice your audio file and transcript into shorter clips. This essentially does the same thing as the timecode approach above, allowing the process to fail for a limited section. You can either:

Splice the file in an audio editing program, import the tracks as separate files, and then import or paste your matching transcript sections in the Properties panel.
Import multiple copies of the same audio file and trim each one within the Character Animator scene. You need to import multiple audio files because the transcripts are linked at the file level, but your transcript text in the Properties panel should match the trimmed track (not the entire audio file).

Known issues and limitations

Transcript-based Lip Sync is still in development.

In this first public Beta version (v22.0.0.31), please note the following:

For audio files longer than about two minutes, add timecode to the transcript at least every two minutes or so, or use an SRT file.
Currently, only English is supported, though if another language was transcribed into text made up of words with typical English phonetics (even if they are not really words), you might be able to get reasonable results.
Avoid abbreviations and spell out symbols and acronyms if they are going to be spoken in the audio file. For example: dollar ($), four point five (4.5), Graphics (GFX).
There is a per-audio-segment progress bar, but it only shows progress for the process of resampling the audio, not for the actual phoneme alignment processing step, so lip sync computation might appear stuck for a bit at the end, particularly with clips in the 2 to 3 minute range.
While the transcript is associated with audio clips, the processing is performed on the rendered scene audio, so if audio overlaps clips being processed, they might interfere with getting good phoneme alignment results.

Transcript text files must use UTF-8 encoding.
Currently, Lip Sync preferences, and specifically the Viseme Detection setting, are not supported for transcript-based lip sync.

What we want to know

We want to hear about your experience with Transcript-based Lip Sync:

What are your overall impressions?
Are you able to get more accurate lip sync results with a transcript?
Are there specific words or phrases that the computed lip sync is failing on?
How can we improve Transcript-based Lip Sync?

Also, we’d love to see what you create with Transcript-based Lip Sync. Share your animations on social media with the #CharacterAnimator hashtag.

Thank you! We’re looking forward to your feedback.

(Use this Beta forum thread to discuss Transcript-based LIp Sync and share your feedback with the Character Animator team and other Beta users. If you encounter a bug, let us know by posting a reply here or choosing Report a bug from the Provide feedback icon in the top-right corner of the app.)

Feedback

G

GIFTMOS

Participant

Hi Jeff! The zip file is not avaliable anymore. Would you kindly reupload it?
I'm having issues with the .srt format and getting errors so need to see the correct formatting for ACA.
Thanks!
Ewgards
/Chris

D

dtull-adobe

Community Manager

An update to transcript lipsync is in build 31 (pushed earlier today, might have to poke at the CC app to get it to recognize that there's an update). Basically if you are using an SRT transcript, it will now process each timecode delimited part of the transcript separately. This means a few things:
• if it fails, it should tend to only lose a few words and it'll be more obvious which line tripped it up

• it should "drift" less because it has more timepoints to keep it lined up

• for really long audio+SRT, it should be a bit faster (I found a 13 minute public domain file with MLK's I have a dream speech and it was about 25% faster: 77 vs 106 seconds)

• the progress dialog will look a little weird though, it wasn't really meant for showing progress for a series of very small items, but that's cosmetic (the number that counts up in the dialog will still give you an idea of progress)

Hopefully that helps a bit. More to come.

DT

R

RadioPro

Participating Frequently

Dan,

Quick question. I just sent a friend my Ch. Anim. file of a puppet I built plus the Ch Data and Ch Media folders and when they open the Ch Anim comp, they get the "missing file" color bars in the viewport. However, I am running a PC and they are on a Mac, so are they not compatible?

D

dtull-adobe

Community Manager

Projects are stored in a platform neutral format, so it should open on Mac or Windows. However, if artwork files for a puppet are not gathered into the project file (via the "Copy Media Files into Project Folder" command in the File menu), it may be unable to locate them on a different machine. Select the puppet (in the project panel) that is showing up with "color bar" content in the scene and see if the puppet's artwork file is in orange, it means the file is not in the expected location.

The resolution can be either running that "Copy Media" command before zipping up the project, or just make sure the the artwork is provided as well and click the orange artwork path in the properties panel to point Character Animator to where the artwork files reside.

Hope that helps!

Dan Tull

Character Animator Team

bergmatt

Participating Frequently

Just downloaded the Beta and am giving is a try. I recorded a short audio, 34 seconds in Premier, did the captions and transcriptions. Created srt file, etc... The compute audio with transcript failed for me. See attached screenshot. I also tried the remove the numbers on the srt file that another user tried but I still got the error message Comput Lip Sync Failed check.

bergmatt

Participating Frequently

Version 22.1 Build 27

D

dtull-adobe

Community Manager

If you're comfortable sharing the audio/srt with me (via a download link in a private message if you prefer), I can take a look and see if I can figure out why it is failing.

Dan Tull

Annmarie lawler

Participating Frequently

Yeah - it's a bit hit and miss. I got it to work yesterday but not today with the same puppet but longer audio. Hmmm.

D

dtull-adobe

Community Manager

Out of curiosity, did it fail entirely on some parts (it'll usually put a marker on segments that failed) or did it produce lower quality visemes or maybe the the issue a few folks have cited where it aligns too aggressively and toward the end of the audio it stops abruptly due to running out of transcript? Just curious.

Thanks for giving it a try and reporting back.

Dan Tull

Adobe Character Animator Team

C

ChPuppets

Participating Frequently

It misses W mouth shapes often when the are at the start of a word. When there is silence the first detected sound it picks up form the next word will fill where silence should be before that word. It seems to pick up accents quite well, for the word Warm it went" ah r b" (again missing the first w) but it did get the ah in warm that the accent had. i find this happens often which is very nice.

It FOR SURE saved time. A friend and i use to animte side by site and we would often debate which was faster, editing a generated lipsync or laying one out fresh on a blank file as you hear it. and they were often about the same lenght of time. so given that it fixes many issues, i feel it is much faster than before.

for three letter words like and, the, its, (etc) i find it does 3 mouthshapes. it could just be my puppet, but 2 mouthshapes for 3 letter words is perfect. any more and you get a muppet-style flappy mouth.

D

dtull-adobe

Community Manager

Thanks for the feedback. I'll have to look at the starting W characters and short words. One thing we have definitely run into is that this way of generating lipsync can be a little too literal/exact. We've looked a little bit at filtering the result to try to make it less "chattery", but those methods still need work. They just reduce the maximum frequency of viseme changes, but aren't very smart about exactly which visemes are superfluous.

Glad to know it is at least helpful in the time saving sense, that's a start. 🐵

Dan Tull

C

ChPuppets

Participating Frequently

I am currently using this new feature with 2 puppets. one has a mouth that is a cycle layers for each mouthshape with 3 or 4 layers in each cycle. The "Chattery-ness" seems far worse with this style of puppet. My other puppet is one mouth-shape per sound and there is a lot less clean-up involved. It would be nice to have a slider where we can control how many viseme's show up per syallable/word. (btw thanks for all the hard work everyone puts into this software. you guys continue to blow my mind.)

UltraEverything

Known Participant

Has anyone had success with this? It seems to work on 5% of the audio, no matter the length/volume/stereo/mono etc...

Just always fails for me apart from on the demo file.

D

dtull-adobe

Community Manager

Hi, thanks for giving it a try. Sorry it doesn't seem to be working on your audio. Does a marker get created in the timeline? What does the error message in it say?

Do you have an audio and transcript file you'd be willing to share? If so, zip the audio and transcript up and private message me a download link I'd be happy to give it a try and see if I can figure out why it is failing.

Hopefully we can get to the bottom of the error you're hitting. Thanks for reporting back!

Dan Tull

Adobe Character Animator Team

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded