:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Shifang, Hu, Yihan, Shan, Ying, Wei, Yunchao, Cun, Xiaodong
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.29664
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition
by: Hu, Yihan, et al.
Published: (2025)

OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning
by: Zhao, Shifang, et al.
Published: (2025)

DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics
by: Hu, Yihan, et al.
Published: (2025)

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
by: Hu, Wenbo, et al.
Published: (2024)

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
by: Niu, Muyao, et al.
Published: (2024)

Learning Trimaps via Clicks for Image Matting
by: Zhang, Chenyi, et al.
Published: (2024)

MagicStick: Controllable Video Editing via Control Handle Transformations
by: Ma, Yue, et al.
Published: (2023)

AlignGen: Boosting Personalized Image Generation with Cross-Modality Prior Alignment
by: Lin, Yiheng, et al.
Published: (2025)

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
by: Chen, Haoxin, et al.
Published: (2024)

GenCompositor: Generative Video Compositing with Diffusion Transformer
by: Yang, Shuzhou, et al.
Published: (2025)

Diffusion for Natural Image Matting
by: Hu, Yihan, et al.
Published: (2023)

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
by: Ma, Yue, et al.
Published: (2023)

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
by: Wu, Tao, et al.
Published: (2024)

CV-VAE: A Compatible Video VAE for Latent Generative Video Models
by: Zhao, Sijie, et al.
Published: (2024)

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning
by: Zhu, Liyun, et al.
Published: (2025)

FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
by: Zheng, Jiayi, et al.
Published: (2025)

EF-VI: Enhancing End-Frame Injection for Video Inbetweening
by: Chen, Liuhan, et al.
Published: (2025)

MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis
by: Zhi, Yihao, et al.
Published: (2025)

StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
by: Zhao, Sijie, et al.
Published: (2024)

LightCtrl: Training-free Controllable Video Relighting
by: Peng, Yizuo, et al.
Published: (2026)

DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing
by: Li, Zixiang, et al.
Published: (2025)

GenClaw: Code-Driven Agentic Image Generation
by: Ye, Junyan, et al.
Published: (2026)

BlobCtrl: Taming Controllable Blob for Element-level Image Editing
by: Li, Yaowei, et al.
Published: (2025)

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
by: Liu, Yaofang, et al.
Published: (2023)

Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
by: Yang, Qinyu, et al.
Published: (2024)

DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
by: Dong, Yue-Jiang, et al.
Published: (2025)

ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
by: Pu, Junfu, et al.
Published: (2025)

Memory Efficient Matting with Adaptive Token Routing
by: Lin, Yiheng, et al.
Published: (2024)

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
by: Liu, Kunhao, et al.
Published: (2025)

Mobius: Text to Seamless Looping Video Generation via Latent Shift
by: Bi, Xiuli, et al.
Published: (2025)

CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training
by: Bi, Xiuli, et al.
Published: (2024)

4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation
by: Yang, Shuzhou, et al.
Published: (2025)

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)

VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding
by: He, Haichen, et al.
Published: (2026)

On Exact Editing of Flow-Based Diffusion Models
by: Li, Zixiang, et al.
Published: (2025)

Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions
by: Zhao, Bo, et al.
Published: (2026)

Towards A Better Metric for Text-to-Video Generation
by: Wu, Jay Zhangjie, et al.
Published: (2024)

PAD-F: Prior-Aware Debiasing Framework for Long-Tailed X-ray Prohibited Item Detection
by: Wang, Haoyu, et al.
Published: (2024)

AnchorSync: Global Consistency Optimization for Long Video Editing
by: Liu, Zichi, et al.
Published: (2025)