Saved in:
| Main Authors: | Zhao, Shifang, Hu, Yihan, Shan, Ying, Wei, Yunchao, Cun, Xiaodong |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.29664 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition
by: Hu, Yihan, et al.
Published: (2025)
by: Hu, Yihan, et al.
Published: (2025)
OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning
by: Zhao, Shifang, et al.
Published: (2025)
by: Zhao, Shifang, et al.
Published: (2025)
DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics
by: Hu, Yihan, et al.
Published: (2025)
by: Hu, Yihan, et al.
Published: (2025)
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
by: Hu, Wenbo, et al.
Published: (2024)
by: Hu, Wenbo, et al.
Published: (2024)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)
by: Yang, Shaoshu, et al.
Published: (2024)
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
by: Niu, Muyao, et al.
Published: (2024)
by: Niu, Muyao, et al.
Published: (2024)
Learning Trimaps via Clicks for Image Matting
by: Zhang, Chenyi, et al.
Published: (2024)
by: Zhang, Chenyi, et al.
Published: (2024)
MagicStick: Controllable Video Editing via Control Handle Transformations
by: Ma, Yue, et al.
Published: (2023)
by: Ma, Yue, et al.
Published: (2023)
AlignGen: Boosting Personalized Image Generation with Cross-Modality Prior Alignment
by: Lin, Yiheng, et al.
Published: (2025)
by: Lin, Yiheng, et al.
Published: (2025)
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
by: Chen, Haoxin, et al.
Published: (2024)
by: Chen, Haoxin, et al.
Published: (2024)
GenCompositor: Generative Video Compositing with Diffusion Transformer
by: Yang, Shuzhou, et al.
Published: (2025)
by: Yang, Shuzhou, et al.
Published: (2025)
Diffusion for Natural Image Matting
by: Hu, Yihan, et al.
Published: (2023)
by: Hu, Yihan, et al.
Published: (2023)
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
by: Ma, Yue, et al.
Published: (2023)
by: Ma, Yue, et al.
Published: (2023)
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
by: Zhao, Sijie, et al.
Published: (2024)
by: Zhao, Sijie, et al.
Published: (2024)
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning
by: Zhu, Liyun, et al.
Published: (2025)
by: Zhu, Liyun, et al.
Published: (2025)
FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
by: Zheng, Jiayi, et al.
Published: (2025)
by: Zheng, Jiayi, et al.
Published: (2025)
EF-VI: Enhancing End-Frame Injection for Video Inbetweening
by: Chen, Liuhan, et al.
Published: (2025)
by: Chen, Liuhan, et al.
Published: (2025)
MV-Performer: Taming Video Diffusion Model for Faithful and Synchronized Multi-view Performer Synthesis
by: Zhi, Yihao, et al.
Published: (2025)
by: Zhi, Yihao, et al.
Published: (2025)
StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
by: Zhao, Sijie, et al.
Published: (2024)
by: Zhao, Sijie, et al.
Published: (2024)
LightCtrl: Training-free Controllable Video Relighting
by: Peng, Yizuo, et al.
Published: (2026)
by: Peng, Yizuo, et al.
Published: (2026)
DCI: Dual-Conditional Inversion for Boosting Diffusion-Based Image Editing
by: Li, Zixiang, et al.
Published: (2025)
by: Li, Zixiang, et al.
Published: (2025)
GenClaw: Code-Driven Agentic Image Generation
by: Ye, Junyan, et al.
Published: (2026)
by: Ye, Junyan, et al.
Published: (2026)
BlobCtrl: Taming Controllable Blob for Element-level Image Editing
by: Li, Yaowei, et al.
Published: (2025)
by: Li, Yaowei, et al.
Published: (2025)
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
by: Liu, Yaofang, et al.
Published: (2023)
by: Liu, Yaofang, et al.
Published: (2023)
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
by: Yang, Qinyu, et al.
Published: (2024)
by: Yang, Qinyu, et al.
Published: (2024)
DepthSync: Diffusion Guidance-Based Depth Synchronization for Scale- and Geometry-Consistent Video Depth Estimation
by: Dong, Yue-Jiang, et al.
Published: (2025)
by: Dong, Yue-Jiang, et al.
Published: (2025)
ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
by: Pu, Junfu, et al.
Published: (2025)
by: Pu, Junfu, et al.
Published: (2025)
Memory Efficient Matting with Adaptive Token Routing
by: Lin, Yiheng, et al.
Published: (2024)
by: Lin, Yiheng, et al.
Published: (2024)
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
by: Liu, Kunhao, et al.
Published: (2025)
by: Liu, Kunhao, et al.
Published: (2025)
Mobius: Text to Seamless Looping Video Generation via Latent Shift
by: Bi, Xiuli, et al.
Published: (2025)
by: Bi, Xiuli, et al.
Published: (2025)
CustomTTT: Motion and Appearance Customized Video Generation via Test-Time Training
by: Bi, Xiuli, et al.
Published: (2024)
by: Bi, Xiuli, et al.
Published: (2024)
4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation
by: Yang, Shuzhou, et al.
Published: (2025)
by: Yang, Shuzhou, et al.
Published: (2025)
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)
by: Cai, Minghong, et al.
Published: (2024)
VideoOdyssey: A Benchmark for Ultra-Long-Context and Omni-Modal Video Understanding
by: He, Haichen, et al.
Published: (2026)
by: He, Haichen, et al.
Published: (2026)
On Exact Editing of Flow-Based Diffusion Models
by: Li, Zixiang, et al.
Published: (2025)
by: Li, Zixiang, et al.
Published: (2025)
Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions
by: Zhao, Bo, et al.
Published: (2026)
by: Zhao, Bo, et al.
Published: (2026)
Towards A Better Metric for Text-to-Video Generation
by: Wu, Jay Zhangjie, et al.
Published: (2024)
by: Wu, Jay Zhangjie, et al.
Published: (2024)
PAD-F: Prior-Aware Debiasing Framework for Long-Tailed X-ray Prohibited Item Detection
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
AnchorSync: Global Consistency Optimization for Long Video Editing
by: Liu, Zichi, et al.
Published: (2025)
by: Liu, Zichi, et al.
Published: (2025)
Similar Items
-
EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition
by: Hu, Yihan, et al.
Published: (2025) -
OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning
by: Zhao, Shifang, et al.
Published: (2025) -
DCEdit: Dual-Level Controlled Image Editing via Precisely Localized Semantics
by: Hu, Yihan, et al.
Published: (2025) -
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
by: Hu, Wenbo, et al.
Published: (2024) -
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)