Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhexin, Xu, Yangyang, Zhu, Yifeng, Chen, Long, Du, Yong, He, Shengfeng, Yu, Jun
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.01955
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910106621313024
author	Zhang, Zhexin Xu, Yangyang Zhu, Yifeng Chen, Long Du, Yong He, Shengfeng Yu, Jun
author_facet	Zhang, Zhexin Xu, Yangyang Zhu, Yifeng Chen, Long Du, Yong He, Shengfeng Yu, Jun
contents	Recent advances in diffusion-based text-to-video models, particularly those built on the diffusion transformer architecture, have achieved remarkable progress in generating high-quality and temporally coherent videos. However, transferring complex motions between videos remains challenging. In this work, we present MotionAdapter, a content-aware motion transfer framework that enables robust and semantically aligned motion transfer within DiT-based video diffusion models. Our key insight is that effective motion transfer requires 1) explicit disentanglement of motion from appearance and 2) adaptive customization of motion to target content. MotionAdapter first isolates motion by analyzing cross-frame attention within 3D full-attention modules to extract attention-derived motion fields. To bridge the semantic gap between reference and target videos, we further introduce a DINO-guided motion customization module that rearranges and refines motion fields based on content correspondences. The customized motion field is then used to guide the DiT denoising process, ensuring that the synthesized video inherits the reference motion while preserving target appearance and semantics. Extensive experiments demonstrate that MotionAdapter outperforms state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, MotionAdapter naturely support complex motion transfer and motion editing tasks such as zooming in/out and composition.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_01955
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization Zhang, Zhexin Xu, Yangyang Zhu, Yifeng Chen, Long Du, Yong He, Shengfeng Yu, Jun Computer Vision and Pattern Recognition Recent advances in diffusion-based text-to-video models, particularly those built on the diffusion transformer architecture, have achieved remarkable progress in generating high-quality and temporally coherent videos. However, transferring complex motions between videos remains challenging. In this work, we present MotionAdapter, a content-aware motion transfer framework that enables robust and semantically aligned motion transfer within DiT-based video diffusion models. Our key insight is that effective motion transfer requires 1) explicit disentanglement of motion from appearance and 2) adaptive customization of motion to target content. MotionAdapter first isolates motion by analyzing cross-frame attention within 3D full-attention modules to extract attention-derived motion fields. To bridge the semantic gap between reference and target videos, we further introduce a DINO-guided motion customization module that rearranges and refines motion fields based on content correspondences. The customized motion field is then used to guide the DiT denoising process, ensuring that the synthesized video inherits the reference motion while preserving target appearance and semantics. Extensive experiments demonstrate that MotionAdapter outperforms state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, MotionAdapter naturely support complex motion transfer and motion editing tasks such as zooming in/out and composition.
title	MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2601.01955

Similar Items