• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

NVIDIA Audio2Face uses AI to generate lip synching and facial animation, showcased in two games

IbizaPocholo

NeoGAFs Kent Brockman

NVIDIA Audio2Face is a powerful generative AI tool that can create accurate and realistic lip-synching and facial animation based on audio input and character traits. Developers are already using it.

Facial animation involves communicating a character's emotion, which makes the tool invaluable because it can pair words with emotion to present a fully animated dramatic facial performance.

The first game featuring the tech is World of Jade Dynasty, a martial arts MMO from the Chinese studio Perfect World Games. With NVIDIA Audio2Face, the studio can generate accurate lip-synching and animation in Chinese and English. As an online title, NVIDIA Audio2Face will allow developers to add new voiced content with the same realistic animation.



The second game showcasing the tech is RealityArts Studio and Toplitz Productions' Unreal Engine 5-powered Unawake, a new action-adventure game launching later this year with DLSS 3 support. This video presents a dynamic look at NVIDIA Audio2Face in action, showcasing the various sliders covering various emotions.

 

Fess

Member
First video looks bad tbh. Second looks good enough.

I’m confused though. Isn’t Starfield using some software auto lip sync tech?? I always assumed it did but maybe it’s motion captured if this is considered a new tech now. Looks better imo.
 

kruis

Exposing the sinister cartel of retailers who allow companies to pay for advertising space.
The lips may move automatically based on the audio, but the audio sync is slightly off so it doesn’t look and feel natural. Oh, and the voice acting was just awful.
 
Last edited:
AI feels sometime like nano (from 10 years ago?) like a buzzword not meaning much anymore. Why do I need AI do make facial expressions? Every language has its own specific phonetics, English has utter chaos, Japanese is a bit mumbly, Hungarian is precise, so it's bound to the letters/syllables a computer can read. Text to speech is not always perfect, but here it is converting existing speech to a physical reprentation? You only have to adapt to the intensity of the voice to match a more shouty, a sad, an angry, a joyful tone? Where do I really need some AI stuff, which can't be done with some old fixed programming?
 

KXVXII9X

Member
It looks freaking awful. So monotone and stiff. No real acting. It feels like people mindlessly reading lines. There is little "acting." The first one is also out of sync which looks weird. Maybe other people don't care about these flaws but they take me out completely.
 
Top Bottom