Saved in:
Bibliographic Details
Main Authors: Zhang, Zhexin, Xu, Yangyang, Zhu, Yifeng, Chen, Long, Du, Yong, He, Shengfeng, Yu, Jun
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.01955
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910106621313024
author Zhang, Zhexin
Xu, Yangyang
Zhu, Yifeng
Chen, Long
Du, Yong
He, Shengfeng
Yu, Jun
author_facet Zhang, Zhexin
Xu, Yangyang
Zhu, Yifeng
Chen, Long
Du, Yong
He, Shengfeng
Yu, Jun
contents Recent advances in diffusion-based text-to-video models, particularly those built on the diffusion transformer architecture, have achieved remarkable progress in generating high-quality and temporally coherent videos. However, transferring complex motions between videos remains challenging. In this work, we present MotionAdapter, a content-aware motion transfer framework that enables robust and semantically aligned motion transfer within DiT-based video diffusion models. Our key insight is that effective motion transfer requires 1) explicit disentanglement of motion from appearance and 2) adaptive customization of motion to target content. MotionAdapter first isolates motion by analyzing cross-frame attention within 3D full-attention modules to extract attention-derived motion fields. To bridge the semantic gap between reference and target videos, we further introduce a DINO-guided motion customization module that rearranges and refines motion fields based on content correspondences. The customized motion field is then used to guide the DiT denoising process, ensuring that the synthesized video inherits the reference motion while preserving target appearance and semantics. Extensive experiments demonstrate that MotionAdapter outperforms state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, MotionAdapter naturely support complex motion transfer and motion editing tasks such as zooming in/out and composition.
format Preprint
id arxiv_https___arxiv_org_abs_2601_01955
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization
Zhang, Zhexin
Xu, Yangyang
Zhu, Yifeng
Chen, Long
Du, Yong
He, Shengfeng
Yu, Jun
Computer Vision and Pattern Recognition
Recent advances in diffusion-based text-to-video models, particularly those built on the diffusion transformer architecture, have achieved remarkable progress in generating high-quality and temporally coherent videos. However, transferring complex motions between videos remains challenging. In this work, we present MotionAdapter, a content-aware motion transfer framework that enables robust and semantically aligned motion transfer within DiT-based video diffusion models. Our key insight is that effective motion transfer requires 1) explicit disentanglement of motion from appearance and 2) adaptive customization of motion to target content. MotionAdapter first isolates motion by analyzing cross-frame attention within 3D full-attention modules to extract attention-derived motion fields. To bridge the semantic gap between reference and target videos, we further introduce a DINO-guided motion customization module that rearranges and refines motion fields based on content correspondences. The customized motion field is then used to guide the DiT denoising process, ensuring that the synthesized video inherits the reference motion while preserving target appearance and semantics. Extensive experiments demonstrate that MotionAdapter outperforms state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, MotionAdapter naturely support complex motion transfer and motion editing tasks such as zooming in/out and composition.
title MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2601.01955