Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Jingchao, Kang, Zejian, Liu, Haibo, Fei, Yuanchen, Huang, Xiangru
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2512.11321
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913114888339456
author	Wu, Jingchao Kang, Zejian Liu, Haibo Fei, Yuanchen Huang, Xiangru
author_facet	Wu, Jingchao Kang, Zejian Liu, Haibo Fei, Yuanchen Huang, Xiangru
contents	Facial animation is a core component for creating digital characters in Computer Graphics (CG) industry. A typical production workflow relies on sparse, semantically meaningful keyframes to precisely control facial expressions. Enabling such animation directly from natural-language descriptions could significantly improve content creation efficiency and accessibility. However, most existing methods adopt a text-to-continuous-frames paradigm, directly regressing dense facial motion trajectories from language. This formulation entangles high-level semantic intent with low-level motion, lacks explicit semantic control structure, and limits precise editing and interpretability. Inspired by the keyframe paradigm in animation production, we propose KeyframeFace, a framework for semantic facial animation from language via interpretable keyframes. Instead of predicting dense motion trajectories, our method represents animation as a sequence of semantically meaningful keyframes in an interpretable ARKit-based facial control space. A language-driven model leverages large language model (LLM) priors to generate keyframes that align with contextual text descriptions and emotion cues. To support this formulation, we construct a multimodal dataset comprising 2,100 expression scripts paired with monocular videos, per-frame ARKit coefficients, and manually annotated semantic keyframes. Experiments show that incorporating semantic keyframe supervision and language priors significantly improves expression fidelity and semantic alignment compared to methods that do not use facial action semantics.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_11321
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes Wu, Jingchao Kang, Zejian Liu, Haibo Fei, Yuanchen Huang, Xiangru Computer Vision and Pattern Recognition Facial animation is a core component for creating digital characters in Computer Graphics (CG) industry. A typical production workflow relies on sparse, semantically meaningful keyframes to precisely control facial expressions. Enabling such animation directly from natural-language descriptions could significantly improve content creation efficiency and accessibility. However, most existing methods adopt a text-to-continuous-frames paradigm, directly regressing dense facial motion trajectories from language. This formulation entangles high-level semantic intent with low-level motion, lacks explicit semantic control structure, and limits precise editing and interpretability. Inspired by the keyframe paradigm in animation production, we propose KeyframeFace, a framework for semantic facial animation from language via interpretable keyframes. Instead of predicting dense motion trajectories, our method represents animation as a sequence of semantically meaningful keyframes in an interpretable ARKit-based facial control space. A language-driven model leverages large language model (LLM) priors to generate keyframes that align with contextual text descriptions and emotion cues. To support this formulation, we construct a multimodal dataset comprising 2,100 expression scripts paired with monocular videos, per-frame ARKit coefficients, and manually annotated semantic keyframes. Experiments show that incorporating semantic keyframe supervision and language priors significantly improves expression fidelity and semantic alignment compared to methods that do not use facial action semantics.
title	KeyframeFace: Language-Driven Facial Animation via Semantic Keyframes
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2512.11321

Similar Items