Saved in:
Bibliographic Details
Main Authors: Li, Yuze, Gong, Dong, Cao, Xiao, Yuan, Junchao, Li, Dongsheng, Zhou, Lei, Koh, Yun Sing, Yan, Cheng, Zhang, Xinyu
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.01000
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Motion transfer has emerged as a promising direction for controllable video generation, yet existing methods largely focus on single-object scenarios and struggle when multiple objects require distinct motion patterns. In this work, we present FlexiMMT, the first implicit image-to-video (I2V) motion transfer framework that explicitly enables multi-object, multi-motion transfer. Given a static multi-object image and multiple reference videos, FlexiMMT independently extracts motion representations and accurately assigns them to different objects, supporting flexible recombination and arbitrary motion-to-object mappings. To address the core challenge of cross-object motion entanglement, we introduce a Motion Decoupled Mask Attention Mechanism that uses object-specific masks to constrain attention, ensuring that motion and text tokens only influence their designated regions. We further propose a Differentiated Mask Propagation Mechanism that derives object-specific masks directly from diffusion attention and progressively propagates them across frames efficiently. Extensive experiments demonstrate that FlexiMMT achieves precise, compositional, and state-of-the-art performance in I2V-based multi-object multi-motion transfer. Our project page is: https://ethan-li123.github.io/FlexiMMT_page/