Every second time if I for example say 'o' the character tells something else. You can't really rely on the mouth showing the corresponding visemes to my speech. It often interpretes something different. My language is german, but does this matter anyway? Any suggestions where this behavior comes from?
I've also been having issues with the visemes. I'm doing videos in Portuguese which will require Japanese versions later on. It was annoying but manually fixable on less than 3 minute videos but I'm editing a 10 minute video now and it's just a pain. Almost makes me wish to go back to the artificial voice samples they had on websites with similar services.