:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Liuhan, Cun, Xiaodong, Li, Xiaoyu, He, Xianyi, Yuan, Shenghai, Chen, Jie, Shan, Ying, Yuan, Li
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2505.21205
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Identity-Preserving Text-to-Video Generation by Frequency Decomposition
by: Yuan, Shenghai, et al.
Published: (2024)

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
by: Li, Zongjian, et al.
Published: (2024)

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)

OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
by: Chen, Liuhan, et al.
Published: (2024)

EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition
by: Hu, Yihan, et al.
Published: (2025)

FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
by: Ge, Yunyang, et al.
Published: (2025)

GenCompositor: Generative Video Compositing with Diffusion Transformer
by: Yang, Shuzhou, et al.
Published: (2025)

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
by: Yuan, Shenghai, et al.
Published: (2025)

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
by: Ma, Yue, et al.
Published: (2023)

ImgEdit: A Unified Image Editing Dataset and Benchmark
by: Ye, Yang, et al.
Published: (2025)

Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
by: Zhu, Tianyi, et al.
Published: (2024)

4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation
by: Yang, Shuzhou, et al.
Published: (2025)

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
by: Chen, Haoxin, et al.
Published: (2024)

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization
by: Zhao, Shifang, et al.
Published: (2026)

AnyAct: Towards Human Reenactment of Character Motion From Video
by: Chen, Liuhan, et al.
Published: (2026)

DiffRefiner: Coarse to Fine Trajectory Planning via Diffusion Refinement with Semantic Interaction for End to End Autonomous Driving
by: Yin, Liuhan, et al.
Published: (2025)

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
by: Hu, Wenbo, et al.
Published: (2024)

CV-VAE: A Compatible Video VAE for Latent Generative Video Models
by: Zhao, Sijie, et al.
Published: (2024)

Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024)

Helios: Real Real-Time Long Video Generation Model
by: Yuan, Shenghai, et al.
Published: (2026)

StructInbet: Integrating Explicit Structural Guidance into Inbetween Frame Generation
by: Pan, Zhenglin, et al.
Published: (2025)

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
by: Wu, Tao, et al.
Published: (2024)

MotionBridge: Dynamic Video Inbetweening with Flexible Controls
by: Tanveer, Maham, et al.
Published: (2024)

LightCtrl: Training-free Controllable Video Relighting
by: Peng, Yizuo, et al.
Published: (2026)

Jacquard V2: Refining Datasets using the Human In the Loop Data Correction Method
by: Li, Qiuhao, et al.
Published: (2024)

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
by: Niu, Muyao, et al.
Published: (2024)

VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning
by: Zhu, Liyun, et al.
Published: (2025)

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
by: Wang, Xiaojuan, et al.
Published: (2024)

Explorative Inbetweening of Time and Space
by: Feng, Haiwen, et al.
Published: (2024)

OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
by: Ge, Yunyang, et al.
Published: (2026)

FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
by: Zheng, Jiayi, et al.
Published: (2025)

Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
by: Yang, Qinyu, et al.
Published: (2024)

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
by: Liu, Yaofang, et al.
Published: (2023)

BlobCtrl: Taming Controllable Blob for Element-level Image Editing
by: Li, Yaowei, et al.
Published: (2025)

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
by: Lin, Bin, et al.
Published: (2025)

MagicStick: Controllable Video Editing via Control Handle Transformations
by: Ma, Yue, et al.
Published: (2023)

High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net
by: Li, Zinuo, et al.
Published: (2023)

T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
by: Li, Changzhen, et al.
Published: (2025)

Mobius: Text to Seamless Looping Video Generation via Latent Shift
by: Bi, Xiuli, et al.
Published: (2025)