Saved in:
| Main Authors: | Chen, Liuhan, Cun, Xiaodong, Li, Xiaoyu, He, Xianyi, Yuan, Shenghai, Chen, Jie, Shan, Ying, Yuan, Li |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.21205 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
by: Yuan, Shenghai, et al.
Published: (2024)
by: Yuan, Shenghai, et al.
Published: (2024)
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
by: Li, Zongjian, et al.
Published: (2024)
by: Li, Zongjian, et al.
Published: (2024)
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024)
by: Yang, Shaoshu, et al.
Published: (2024)
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
by: Chen, Liuhan, et al.
Published: (2024)
by: Chen, Liuhan, et al.
Published: (2024)
EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition
by: Hu, Yihan, et al.
Published: (2025)
by: Hu, Yihan, et al.
Published: (2025)
FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
by: Ge, Yunyang, et al.
Published: (2025)
by: Ge, Yunyang, et al.
Published: (2025)
GenCompositor: Generative Video Compositing with Diffusion Transformer
by: Yang, Shuzhou, et al.
Published: (2025)
by: Yang, Shuzhou, et al.
Published: (2025)
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation
by: Yuan, Shenghai, et al.
Published: (2025)
by: Yuan, Shenghai, et al.
Published: (2025)
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos
by: Ma, Yue, et al.
Published: (2023)
by: Ma, Yue, et al.
Published: (2023)
ImgEdit: A Unified Image Editing Dataset and Benchmark
by: Ye, Yang, et al.
Published: (2025)
by: Ye, Yang, et al.
Published: (2025)
Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
by: Zhu, Tianyi, et al.
Published: (2024)
by: Zhu, Tianyi, et al.
Published: (2024)
4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation
by: Yang, Shuzhou, et al.
Published: (2025)
by: Yang, Shuzhou, et al.
Published: (2025)
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
by: Chen, Haoxin, et al.
Published: (2024)
by: Chen, Haoxin, et al.
Published: (2024)
CutClaw: Agentic Hours-Long Video Editing via Music Synchronization
by: Zhao, Shifang, et al.
Published: (2026)
by: Zhao, Shifang, et al.
Published: (2026)
AnyAct: Towards Human Reenactment of Character Motion From Video
by: Chen, Liuhan, et al.
Published: (2026)
by: Chen, Liuhan, et al.
Published: (2026)
DiffRefiner: Coarse to Fine Trajectory Planning via Diffusion Refinement with Semantic Interaction for End to End Autonomous Driving
by: Yin, Liuhan, et al.
Published: (2025)
by: Yin, Liuhan, et al.
Published: (2025)
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
by: Hu, Wenbo, et al.
Published: (2024)
by: Hu, Wenbo, et al.
Published: (2024)
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
by: Zhao, Sijie, et al.
Published: (2024)
by: Zhao, Sijie, et al.
Published: (2024)
Open-Sora Plan: Open-Source Large Video Generation Model
by: Lin, Bin, et al.
Published: (2024)
by: Lin, Bin, et al.
Published: (2024)
Helios: Real Real-Time Long Video Generation Model
by: Yuan, Shenghai, et al.
Published: (2026)
by: Yuan, Shenghai, et al.
Published: (2026)
StructInbet: Integrating Explicit Structural Guidance into Inbetween Frame Generation
by: Pan, Zhenglin, et al.
Published: (2025)
by: Pan, Zhenglin, et al.
Published: (2025)
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
by: Cai, Minghong, et al.
Published: (2024)
by: Cai, Minghong, et al.
Published: (2024)
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
MotionBridge: Dynamic Video Inbetweening with Flexible Controls
by: Tanveer, Maham, et al.
Published: (2024)
by: Tanveer, Maham, et al.
Published: (2024)
LightCtrl: Training-free Controllable Video Relighting
by: Peng, Yizuo, et al.
Published: (2026)
by: Peng, Yizuo, et al.
Published: (2026)
Jacquard V2: Refining Datasets using the Human In the Loop Data Correction Method
by: Li, Qiuhao, et al.
Published: (2024)
by: Li, Qiuhao, et al.
Published: (2024)
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
by: Niu, Muyao, et al.
Published: (2024)
by: Niu, Muyao, et al.
Published: (2024)
VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning
by: Zhu, Liyun, et al.
Published: (2025)
by: Zhu, Liyun, et al.
Published: (2025)
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
by: Wang, Xiaojuan, et al.
Published: (2024)
by: Wang, Xiaojuan, et al.
Published: (2024)
Explorative Inbetweening of Time and Space
by: Feng, Haiwen, et al.
Published: (2024)
by: Feng, Haiwen, et al.
Published: (2024)
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
by: Ge, Yunyang, et al.
Published: (2026)
by: Ge, Yunyang, et al.
Published: (2026)
FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
by: Zheng, Jiayi, et al.
Published: (2025)
by: Zheng, Jiayi, et al.
Published: (2025)
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models
by: Yang, Qinyu, et al.
Published: (2024)
by: Yang, Qinyu, et al.
Published: (2024)
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
by: Liu, Yaofang, et al.
Published: (2023)
by: Liu, Yaofang, et al.
Published: (2023)
BlobCtrl: Taming Controllable Blob for Element-level Image Editing
by: Li, Yaowei, et al.
Published: (2025)
by: Li, Yaowei, et al.
Published: (2025)
UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
by: Lin, Bin, et al.
Published: (2025)
by: Lin, Bin, et al.
Published: (2025)
MagicStick: Controllable Video Editing via Control Handle Transformations
by: Ma, Yue, et al.
Published: (2023)
by: Ma, Yue, et al.
Published: (2023)
High-Resolution Document Shadow Removal via A Large-Scale Real-World Dataset and A Frequency-Aware Shadow Erasing Net
by: Li, Zinuo, et al.
Published: (2023)
by: Li, Zinuo, et al.
Published: (2023)
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models
by: Li, Changzhen, et al.
Published: (2025)
by: Li, Changzhen, et al.
Published: (2025)
Mobius: Text to Seamless Looping Video Generation via Latent Shift
by: Bi, Xiuli, et al.
Published: (2025)
by: Bi, Xiuli, et al.
Published: (2025)
Similar Items
-
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
by: Yuan, Shenghai, et al.
Published: (2024) -
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
by: Li, Zongjian, et al.
Published: (2024) -
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
by: Yang, Shaoshu, et al.
Published: (2024) -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
by: Chen, Liuhan, et al.
Published: (2024) -
EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition
by: Hu, Yihan, et al.
Published: (2025)