Saved in:
Bibliographic Details
Main Authors: Wu, Jingchao, Kang, Zejian, Liu, Haibo, Fei, Yuanchen, Huang, Xiangru
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.11321
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913114888339456
author Wu, Jingchao
Kang, Zejian
Liu, Haibo
Fei, Yuanchen
Huang, Xiangru
author_facet Wu, Jingchao
Kang, Zejian
Liu, Haibo
Fei, Yuanchen
Huang, Xiangru
contents Facial animation is a core component for creating digital characters in Computer Graphics (CG) industry. A typical production workflow relies on sparse, semantically meaningful keyframes to precisely control facial expressions. Enabling such animation directly from natural-language descriptions could significantly improve content creation efficiency and accessibility. However, most existing methods adopt a text-to-continuous-frames paradigm, directly regressing dense facial motion trajectories from language. This formulation entangles high-level semantic intent with low-level motion, lacks explicit semantic control structure, and limits precise editing and interpretability. Inspired by the keyframe paradigm in animation production, we propose KeyframeFace, a framework for semantic facial animation from language via interpretable keyframes. Instead of predicting dense motion trajectories, our method represents animation as a sequence of semantically meaningful keyframes in an interpretable ARKit-based facial control space. A language-driven model leverages large language model (LLM) priors to generate keyframes that align with contextual text descriptions and emotion cues. To support this formulation, we construct a multimodal dataset comprising 2,100 expression scripts paired with monocular videos, per-frame ARKit coefficients, and manually annotated semantic keyframes. Experiments show that incorporating semantic keyframe supervision and language priors significantly improves expression fidelity and semantic alignment compared to methods that do not use facial action semantics.
format Preprint
id arxiv_https___arxiv_org_abs_2512_11321
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes
Wu, Jingchao
Kang, Zejian
Liu, Haibo
Fei, Yuanchen
Huang, Xiangru
Computer Vision and Pattern Recognition
Facial animation is a core component for creating digital characters in Computer Graphics (CG) industry. A typical production workflow relies on sparse, semantically meaningful keyframes to precisely control facial expressions. Enabling such animation directly from natural-language descriptions could significantly improve content creation efficiency and accessibility. However, most existing methods adopt a text-to-continuous-frames paradigm, directly regressing dense facial motion trajectories from language. This formulation entangles high-level semantic intent with low-level motion, lacks explicit semantic control structure, and limits precise editing and interpretability. Inspired by the keyframe paradigm in animation production, we propose KeyframeFace, a framework for semantic facial animation from language via interpretable keyframes. Instead of predicting dense motion trajectories, our method represents animation as a sequence of semantically meaningful keyframes in an interpretable ARKit-based facial control space. A language-driven model leverages large language model (LLM) priors to generate keyframes that align with contextual text descriptions and emotion cues. To support this formulation, we construct a multimodal dataset comprising 2,100 expression scripts paired with monocular videos, per-frame ARKit coefficients, and manually annotated semantic keyframes. Experiments show that incorporating semantic keyframe supervision and language priors significantly improves expression fidelity and semantic alignment compared to methods that do not use facial action semantics.
title KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2512.11321