Get better lip sync with improved Adobe Sensei machine-learning technology. Use a transcript to produce a more accurate result.
Using a transcript to improve computed lip sync
Character Animator analyzes the audio and, using the associated transcript text, should produce more accurate visemes for the Lip Sync take than if no transcript was used.
If you need to make corrections to the transcript, update the text in the Transcript text area, and then choose the Compute Lip Sync Take from Scene Audio and Transcript command again.
If transcript-based lip sync fails:
Known issues and limitations
Transcript-based Lip Sync is still in development.
In this first public Beta version (v22.214.171.124), please note the following:
What we want to know
We want to hear about your experience with Transcript-based Lip Sync:
Also, we’d love to see what you create with Transcript-based Lip Sync. Share your animations on social media with the #CharacterAnimator hashtag.
Thank you! We’re looking forward to your feedback.
(Use this Beta forum thread to discuss Transcript-based LIp Sync and share your feedback with the Character Animator team and other Beta users. If you encounter a bug, let us know by posting a reply here or choosing Report a bug from the Provide feedback icon in the top-right corner of the app.)
Has anyone had success with this? It seems to work on 5% of the audio, no matter the length/volume/stereo/mono etc...
Just always fails for me apart from on the demo file.
Hi, thanks for giving it a try. Sorry it doesn't seem to be working on your audio. Does a marker get created in the timeline? What does the error message in it say?
Do you have an audio and transcript file you'd be willing to share? If so, zip the audio and transcript up and private message me a download link I'd be happy to give it a try and see if I can figure out why it is failing.
Hopefully we can get to the bottom of the error you're hitting. Thanks for reporting back!
Adobe Character Animator Team
I have had the same issue where it didn't work well for me either with a SRT file.
What I ended up doing is removing the numbers, but leaving the timecodes, then breaking up the file in pieces.
I.e. copy only the first 5-6 seconds of text, converting it to the Visemes, then doing the next 5-6 seconds until I got the entire audio working! Not ideal, but it got me what I needed.
I am not sure if there was something in the transcript causing the issue, or something else
I've seen some cases where when transcript lip sync can "drift" a bit for longer segments of text. The usual failure mode is that there's a gap toward the end of the segment that doesn't get any visemes because it aligned too agressively and ran out of text before it ran out of audio. This is really common if there's other sounds mixed in with the audio, but can happen even for just speech, too.
The current implementation has some initialization that happens per segment so it batches segments up to process more at once (by default it tries to make segments about 45-60 seconds long) to strike a balance between performance and precision.
When I lift that init code out so it runs once per invocation (or maybe even once per app session) it should be faster and based on your explanation above, it should get a better result for the SRT case, too. Another advantage is that if there's a piece that it struggles to align, it'll only lose the one timecode range (which for SRT can be just one short phrase).
That was probably too much technical detail, but hey it's a beta program and I figured I'd be as transparent about what's probably happening as possible. Thanks for the feedback!
Adobe Character Animator Team
It misses W mouth shapes often when the are at the start of a word. When there is silence the first detected sound it picks up form the next word will fill where silence should be before that word. It seems to pick up accents quite well, for the word Warm it went" ah r b" (again missing the first w) but it did get the ah in warm that the accent had. i find this happens often which is very nice.
It FOR SURE saved time. A friend and i use to animte side by site and we would often debate which was faster, editing a generated lipsync or laying one out fresh on a blank file as you hear it. and they were often about the same lenght of time. so given that it fixes many issues, i feel it is much faster than before.
for three letter words like and, the, its, (etc) i find it does 3 mouthshapes. it could just be my puppet, but 2 mouthshapes for 3 letter words is perfect. any more and you get a muppet-style flappy mouth.
Thanks for the feedback. I'll have to look at the starting W characters and short words. One thing we have definitely run into is that this way of generating lipsync can be a little too literal/exact. We've looked a little bit at filtering the result to try to make it less "chattery", but those methods still need work. They just reduce the maximum frequency of viseme changes, but aren't very smart about exactly which visemes are superfluous.
Glad to know it is at least helpful in the time saving sense, that's a start. :o)
I am currently using this new feature with 2 puppets. one has a mouth that is a cycle layers for each mouthshape with 3 or 4 layers in each cycle. The "Chattery-ness" seems far worse with this style of puppet. My other puppet is one mouth-shape per sound and there is a lot less clean-up involved. It would be nice to have a slider where we can control how many viseme's show up per syallable/word. (btw thanks for all the hard work everyone puts into this software. you guys continue to blow my mind.)
We have a simple implementation of supporting a preference for how many visemes are produced. It isn't very smart yet about which visemes it skips, but it might help for a case like this.
Thanks for the feedback and kind words. :o)
Adobe Character Animator Team
Yeah - it's a bit hit and miss. I got it to work yesterday but not today with the same puppet but longer audio. Hmmm.
Out of curiosity, did it fail entirely on some parts (it'll usually put a marker on segments that failed) or did it produce lower quality visemes or maybe the the issue a few folks have cited where it aligns too aggressively and toward the end of the audio it stops abruptly due to running out of transcript? Just curious.
Thanks for giving it a try and reporting back.
Adobe Character Animator Team
Just downloaded the Beta and am giving is a try. I recorded a short audio, 34 seconds in Premier, did the captions and transcriptions. Created srt file, etc... The compute audio with transcript failed for me. See attached screenshot. I also tried the remove the numbers on the srt file that another user tried but I still got the error message Comput Lip Sync Failed check.
Version 22.1 Build 27
If you're comfortable sharing the audio/srt with me (via a download link in a private message if you prefer), I can take a look and see if I can figure out why it is failing.
Happy to. I work at a school and use Character Animator for weekly advisory announcements. Can you shoot me a private message and I'm happy to get you the files.
Found it. My development version (with an enhancement I hope to release soon) does the SRT processing in smaller segments which makes it faster to home in on issues.
The problem seems to be that the last SRT segment is truncated. It says 00:00:34,368, but that is in the middle of the word you and cuts off the name at the end entirely. When I change that to 00:00:34,668, it doesn't fail. :o)
When I get this new version released, an error like that would only lose that last line, so it should be a lot easier to figure out what's going wrong. Coming soon!
Thank you! I appreciate the hard work and all of the new features.
Just tried it, worked like a charm! I'll keep this in mind if I run into a similar issue prior to next version. Thank you again!
Great! Glad it helped, seeing more examples is a big help, so thanks for the report!
An update to transcript lipsync is in build 31 (pushed earlier today, might have to poke at the CC app to get it to recognize that there's an update). Basically if you are using an SRT transcript, it will now process each timecode delimited part of the transcript separately. This means a few things:
• if it fails, it should tend to only lose a few words and it'll be more obvious which line tripped it up
• it should "drift" less because it has more timepoints to keep it lined up
• for really long audio+SRT, it should be a bit faster (I found a 13 minute public domain file with MLK's I have a dream speech and it was about 25% faster: 77 vs 106 seconds)
• the progress dialog will look a little weird though, it wasn't really meant for showing progress for a series of very small items, but that's cosmetic (the number that counts up in the dialog will still give you an idea of progress)
Hopefully that helps a bit. More to come.
Quick question. I just sent a friend my Ch. Anim. file of a puppet I built plus the Ch Data and Ch Media folders and when they open the Ch Anim comp, they get the "missing file" color bars in the viewport. However, I am running a PC and they are on a Mac, so are they not compatible?
Projects are stored in a platform neutral format, so it should open on Mac or Windows. However, if artwork files for a puppet are not gathered into the project file (via the "Copy Media Files into Project Folder" command in the File menu), it may be unable to locate them on a different machine. Select the puppet (in the project panel) that is showing up with "color bar" content in the scene and see if the puppet's artwork file is in orange, it means the file is not in the expected location.
The resolution can be either running that "Copy Media" command before zipping up the project, or just make sure the the artwork is provided as well and click the orange artwork path in the properties panel to point Character Animator to where the artwork files reside.
Hope that helps!
Character Animator Team