Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.21086 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866912604795961344 |
|---|---|
| author | Lei, Guojun Zhang, Rong Wang, Chi Liu, Tianhang Li, Hong Ma, Zhiyuan Xu, Weiwei |
| author_facet | Lei, Guojun Zhang, Rong Wang, Chi Liu, Tianhang Li, Hong Ma, Zhiyuan Xu, Weiwei |
| contents | We propose a novel architecture UniTransfer, which introduces both spatial and diffusion timestep decomposition in a progressive paradigm, achieving precise and controllable video concept transfer. Specifically, in terms of spatial decomposition, we decouple videos into three key components: the foreground subject, the background, and the motion flow. Building upon this decomposed formulation, we further introduce a dual-to-single-stream DiT-based architecture for supporting fine-grained control over different components in the videos. We also introduce a self-supervised pretraining strategy based on random masking to enhance the decomposed representation learning from large-scale unlabeled video data. Inspired by the Chain-of-Thought reasoning paradigm, we further revisit the denoising diffusion process and propose a Chain-of-Prompt (CoP) mechanism to achieve the timestep decomposition. We decompose the denoising process into three stages of different granularity and leverage large language models (LLMs) for stage-specific instructions to guide the generation progressively. We also curate an animal-centric video dataset called OpenAnimal to facilitate the advancement and benchmarking of research in video concept transfer. Extensive experiments demonstrate that our method achieves high-quality and controllable video concept transfer across diverse reference images and scenes, surpassing existing baselines in both visual fidelity and editability. Web Page: https://yu-shaonian.github.io/UniTransfer-Web/ |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2509_21086 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition Lei, Guojun Zhang, Rong Wang, Chi Liu, Tianhang Li, Hong Ma, Zhiyuan Xu, Weiwei Computer Vision and Pattern Recognition We propose a novel architecture UniTransfer, which introduces both spatial and diffusion timestep decomposition in a progressive paradigm, achieving precise and controllable video concept transfer. Specifically, in terms of spatial decomposition, we decouple videos into three key components: the foreground subject, the background, and the motion flow. Building upon this decomposed formulation, we further introduce a dual-to-single-stream DiT-based architecture for supporting fine-grained control over different components in the videos. We also introduce a self-supervised pretraining strategy based on random masking to enhance the decomposed representation learning from large-scale unlabeled video data. Inspired by the Chain-of-Thought reasoning paradigm, we further revisit the denoising diffusion process and propose a Chain-of-Prompt (CoP) mechanism to achieve the timestep decomposition. We decompose the denoising process into three stages of different granularity and leverage large language models (LLMs) for stage-specific instructions to guide the generation progressively. We also curate an animal-centric video dataset called OpenAnimal to facilitate the advancement and benchmarking of research in video concept transfer. Extensive experiments demonstrate that our method achieves high-quality and controllable video concept transfer across diverse reference images and scenes, surpassing existing baselines in both visual fidelity and editability. Web Page: https://yu-shaonian.github.io/UniTransfer-Web/ |
| title | UniTransfer: Video Concept Transfer via Progressive Spatial and Timestep Decomposition |
| topic | Computer Vision and Pattern Recognition |
| url | https://arxiv.org/abs/2509.21086 |