Saved in:
Bibliographic Details
Main Authors: Liang, Jiadong, Xiong, Bojun, Tian, Jie, Li, Hua, Long, Xiao, Zheng, Yong, Fu, Huan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.19731
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910061188612096
author Liang, Jiadong
Xiong, Bojun
Tian, Jie
Li, Hua
Long, Xiao
Zheng, Yong
Fu, Huan
author_facet Liang, Jiadong
Xiong, Bojun
Tian, Jie
Li, Hua
Long, Xiao
Zheng, Yong
Fu, Huan
contents This paper primarily investigates the task of expression-only portrait video performance editing based on a driving video, which plays a crucial role in animation and film industries. Most existing research mainly focuses on portrait animation, which aims to animate a static portrait image according to the facial motion from the driving video. As a consequence, it remains challenging for them to disentangle the facial expression from head pose rotation and thus lack the ability to edit facial expression independently. In this paper, we propose PerformRecast, a versatile expression-only video editing method which is dedicated to recast the performance in existing film and animation. The key insight of our method comes from the characteristics of 3D Morphable Face Model (3DMM), which models the face identity, facial expression and head pose of 3D face mesh with separate parameters. Therefore, we improve the keypoints transformation formula in previous methods to make it more consistent with 3DMM model, which achieves a better disentanglement and provides users with much more fine-grained control. Furthermore, to avoid the misalignment around the boundary of face in generated results, we decouple the facial and non-facial regions of input portrait images and pre-train a teacher model to provide separate supervision for them. Extensive experiments show that our method produces high-quality results which are more faithful to the driving video, outperforming existing methods in both controllability and efficiency. Our code, data and trained models are available at https://youku-aigc.github.io/PerformRecast.
format Preprint
id arxiv_https___arxiv_org_abs_2603_19731
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle PerformRecast: Expression and Head Pose Disentanglement for Portrait Video Editing
Liang, Jiadong
Xiong, Bojun
Tian, Jie
Li, Hua
Long, Xiao
Zheng, Yong
Fu, Huan
Computer Vision and Pattern Recognition
This paper primarily investigates the task of expression-only portrait video performance editing based on a driving video, which plays a crucial role in animation and film industries. Most existing research mainly focuses on portrait animation, which aims to animate a static portrait image according to the facial motion from the driving video. As a consequence, it remains challenging for them to disentangle the facial expression from head pose rotation and thus lack the ability to edit facial expression independently. In this paper, we propose PerformRecast, a versatile expression-only video editing method which is dedicated to recast the performance in existing film and animation. The key insight of our method comes from the characteristics of 3D Morphable Face Model (3DMM), which models the face identity, facial expression and head pose of 3D face mesh with separate parameters. Therefore, we improve the keypoints transformation formula in previous methods to make it more consistent with 3DMM model, which achieves a better disentanglement and provides users with much more fine-grained control. Furthermore, to avoid the misalignment around the boundary of face in generated results, we decouple the facial and non-facial regions of input portrait images and pre-train a teacher model to provide separate supervision for them. Extensive experiments show that our method produces high-quality results which are more faithful to the driving video, outperforming existing methods in both controllability and efficiency. Our code, data and trained models are available at https://youku-aigc.github.io/PerformRecast.
title PerformRecast: Expression and Head Pose Disentanglement for Portrait Video Editing
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2603.19731