Feature Focus: Transcript-based Lip Sync
Get better lip sync with improved Adobe Sensei machine-learning technology. Use a transcript to produce a more accurate result.
Using a transcript to improve computed lip sync
- Open Character Animator (Beta).
- Create a scene from an example puppet on the Home screen (e.g., Chloe (Photoshop)) or open a scene containing one of your puppets.
- Choose File > Import, then select the Toothsome Meme.wav audio file (from the Toothsome Meme.zip archive) to import it into the project.
- With the audio selected in the Project panel, the Properties panel shows the Transcript text area where you can import or type in the text matching the spoken words and phrases in the audio. For this example, click Import in the Properties panel, then select the Toothsome Meme.txt text file (from the Toothsome Meme.zip archive).
The audio file’s icon in the Project panel changes to
to indicate it has a text transcript associated with it. The Type column in the Project panel shows Audio+Transcript for this file. - Drag the Toothsome Meme.wav file from the Project panel into the Timeline panel to add it to the scene.
- Select the puppet track in the Timeline panel, hold down the Shift key as you select the audio track so that both tracks are selected, then choose the Timeline > Compute Lip Sync Take from Scene Audio and Transcript menu command.
Character Animator analyzes the audio and, using the associated transcript text, should produce more accurate visemes for the Lip Sync take than if no transcript was used.
If you need to make corrections to the transcript, update the text in the Transcript text area, and then choose the Compute Lip Sync Take from Scene Audio and Transcript command again.
Troubleshooting
If transcript-based lip sync fails:
- Check your transcript for typos, missing words, or other mismatch errors.
- Add timecodes to the transcript to allow the process to skip over sections with errors. For example, see the Toothsome Meme.srt file (from the Toothsome Meme.zip archive). You can then run standard audio-only lip sync to fill in the gaps.
You can type the timecodes manually or use a transcription program to generate an SRT file (.srt extension) with timecodes. For an .srt file, change its extension to .txt to select it for import, or copy and paste the text directly into the Transcript text area in the Properties panel.
- Splice your audio file and transcript into shorter clips. This essentially does the same thing as the timecode approach above, allowing the process to fail for a limited section. You can either:
- Splice the file in an audio editing program, import the tracks as separate files, and then import or paste your matching transcript sections in the Properties panel.
- Import multiple copies of the same audio file and trim each one within the Character Animator scene. You need to import multiple audio files because the transcripts are linked at the file level, but your transcript text in the Properties panel should match the trimmed track (not the entire audio file).
Known issues and limitations
Transcript-based Lip Sync is still in development.
In this first public Beta version (v22.0.0.31), please note the following:
- For audio files longer than about two minutes, add timecode to the transcript at least every two minutes or so, or use an SRT file.
- Currently, only English is supported, though if another language was transcribed into text made up of words with typical English phonetics (even if they are not really words), you might be able to get reasonable results.
- Avoid abbreviations and spell out symbols and acronyms if they are going to be spoken in the audio file. For example: dollar ($), four point five (4.5), Graphics (GFX).
- There is a per-audio-segment progress bar, but it only shows progress for the process of resampling the audio, not for the actual phoneme alignment processing step, so lip sync computation might appear stuck for a bit at the end, particularly with clips in the 2 to 3 minute range.
- While the transcript is associated with audio clips, the processing is performed on the rendered scene audio, so if audio overlaps clips being processed, they might interfere with getting good phoneme alignment results.
- Transcript text files must use UTF-8 encoding.
- Currently, Lip Sync preferences, and specifically the Viseme Detection setting, are not supported for transcript-based lip sync.
What we want to know
We want to hear about your experience with Transcript-based Lip Sync:
- What are your overall impressions?
- Are you able to get more accurate lip sync results with a transcript?
- Are there specific words or phrases that the computed lip sync is failing on?
- How can we improve Transcript-based Lip Sync?
Also, we’d love to see what you create with Transcript-based Lip Sync. Share your animations on social media with the #CharacterAnimator hashtag.
Thank you! We’re looking forward to your feedback.
(Use this Beta forum thread to discuss Transcript-based LIp Sync and share your feedback with the Character Animator team and other Beta users. If you encounter a bug, let us know by posting a reply here or choosing Report a bug from the Provide feedback icon in the top-right corner of the app.)
