Overview

MakeItTalk is a sophisticated AI framework focused on speaker-aware talking head animation, originally introduced at SIGGRAPH 2020. Unlike simple warping methods, MakeItTalk utilizes a 3D Morphable Model (3DMM) to disentangle identity-specific and speech-specific facial landmarks. By predicting facial movements based on audio input, it generates highly realistic animations from a single portrait image. In the 2026 landscape, MakeItTalk serves as a critical, lightweight baseline for developers requiring real-time, landmark-based animation on edge devices where heavy diffusion-based models (like EMO or LivePortrait) might be computationally prohibitive. The architecture effectively captures not just lip movement, but also non-verbal cues such as head tilts, eye blinks, and brow movements, synchronized with the audio's prosody. It is particularly valued in the research community for its ability to animate diverse subjects, including oil paintings, sketches, and 2D cartoon characters, making it a versatile tool for stylized digital content creation and legacy photo revitalization.

Common tasks

Audio-driven lip-syncing Still image animation Non-verbal expression synthesis Style-preserving facial movement