Saved in:
| Main Authors: | Si, Chenyang, Fan, Weichen, Lv, Zhengyao, Huang, Ziqi, Qiao, Yu, Liu, Ziwei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.08994 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
by: Lv, Zhengyao, et al.
Published: (2025)
by: Lv, Zhengyao, et al.
Published: (2025)
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
by: Lv, Zhengyao, et al.
Published: (2024)
by: Lv, Zhengyao, et al.
Published: (2024)
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
by: Lv, Zhengyao, et al.
Published: (2025)
by: Lv, Zhengyao, et al.
Published: (2025)
StableWorld: Towards Stable and Consistent Long Interactive Video Generation
by: Yang, Ying, et al.
Published: (2026)
by: Yang, Ying, et al.
Published: (2026)
FreeInit: Bridging Initialization Gap in Video Diffusion Models
by: Wu, Tianxing, et al.
Published: (2023)
by: Wu, Tianxing, et al.
Published: (2023)
LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
by: Gao, Jianxiong, et al.
Published: (2025)
by: Gao, Jianxiong, et al.
Published: (2025)
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation
by: Chen, Gordon, et al.
Published: (2026)
by: Chen, Gordon, et al.
Published: (2026)
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
by: Huang, Ziqi, et al.
Published: (2024)
by: Huang, Ziqi, et al.
Published: (2024)
NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing
by: Pan, Tianlin, et al.
Published: (2026)
by: Pan, Tianlin, et al.
Published: (2026)
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
by: Fan, Weichen, et al.
Published: (2025)
by: Fan, Weichen, et al.
Published: (2025)
RealDPO: Real or Not Real, that is the Preference
by: Cheng, Guo, et al.
Published: (2025)
by: Cheng, Guo, et al.
Published: (2025)
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
by: Zhang, Fan, et al.
Published: (2024)
by: Zhang, Fan, et al.
Published: (2024)
LongVie 2: Multimodal Controllable Ultra-Long Video World Model
by: Gao, Jianxiong, et al.
Published: (2025)
by: Gao, Jianxiong, et al.
Published: (2025)
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
by: Zheng, Dian, et al.
Published: (2025)
by: Zheng, Dian, et al.
Published: (2025)
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
by: Liu, Dongyang, et al.
Published: (2025)
by: Liu, Dongyang, et al.
Published: (2025)
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
by: Fan, Weichen, et al.
Published: (2025)
by: Fan, Weichen, et al.
Published: (2025)
DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation
by: Yang, Ying, et al.
Published: (2025)
by: Yang, Ying, et al.
Published: (2025)
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
by: Cao, Yukang, et al.
Published: (2025)
by: Cao, Yukang, et al.
Published: (2025)
DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution
by: Lv, Zhengyao, et al.
Published: (2026)
by: Lv, Zhengyao, et al.
Published: (2026)
VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment
by: Meng, Shibei, et al.
Published: (2026)
by: Meng, Shibei, et al.
Published: (2026)
RepNet-VSR: Reparameterizable Architecture for High-Fidelity Video Super-Resolution
by: Wu, Biao, et al.
Published: (2025)
by: Wu, Biao, et al.
Published: (2025)
Latte: Latent Diffusion Transformer for Video Generation
by: Ma, Xin, et al.
Published: (2024)
by: Ma, Xin, et al.
Published: (2024)
Rethinking Reward Signals in Video GRPO: When Scores Become Targets
by: Li, Rui, et al.
Published: (2025)
by: Li, Rui, et al.
Published: (2025)
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
by: Gu, Zekai, et al.
Published: (2025)
by: Gu, Zekai, et al.
Published: (2025)
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)
by: Cheng, Zixu, et al.
Published: (2025)
HoLa: B-Rep Generation using a Holistic Latent Representation
by: Liu, Yilin, et al.
Published: (2025)
by: Liu, Yilin, et al.
Published: (2025)
MR. Video: "MapReduce" is the Principle for Long Video Understanding
by: Pang, Ziqi, et al.
Published: (2025)
by: Pang, Ziqi, et al.
Published: (2025)
DenoiseRep: Denoising Model for Representation Learning
by: Xu, Zhengrui, et al.
Published: (2024)
by: Xu, Zhengrui, et al.
Published: (2024)
Demystifying Video Reasoning
by: Wang, Ruisi, et al.
Published: (2026)
by: Wang, Ruisi, et al.
Published: (2026)
Stencil: Subject-Driven Generation with Context Guidance
by: Chen, Gordon, et al.
Published: (2025)
by: Chen, Gordon, et al.
Published: (2025)
STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models
by: Fan, Linfeng, et al.
Published: (2026)
by: Fan, Linfeng, et al.
Published: (2026)
CineScale: Free Lunch in High-Resolution Cinematic Visual Generation
by: Qiu, Haonan, et al.
Published: (2025)
by: Qiu, Haonan, et al.
Published: (2025)
Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization
by: Ju, Hao, et al.
Published: (2024)
by: Ju, Hao, et al.
Published: (2024)
Lighting-grounded Video Generation with Renderer-based Agent Reasoning
by: Cai, Ziqi, et al.
Published: (2026)
by: Cai, Ziqi, et al.
Published: (2026)
Cut2Next: Generating Next Shot via In-Context Tuning
by: He, Jingwen, et al.
Published: (2025)
by: He, Jingwen, et al.
Published: (2025)
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
by: Fan, Weichen, et al.
Published: (2025)
by: Fan, Weichen, et al.
Published: (2025)
VEnhancer: Generative Space-Time Enhancement for Video Generation
by: He, Jingwen, et al.
Published: (2024)
by: He, Jingwen, et al.
Published: (2024)
GeoVideo: Introducing Geometric Regularization into Video Generation Model
by: Bai, Yunpeng, et al.
Published: (2025)
by: Bai, Yunpeng, et al.
Published: (2025)
CoS: Chain-of-Shot Prompting for Long Video Understanding
by: Hu, Jian, et al.
Published: (2025)
by: Hu, Jian, et al.
Published: (2025)
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
by: Wu, Jianzong, et al.
Published: (2024)
by: Wu, Jianzong, et al.
Published: (2024)
Similar Items
-
Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
by: Lv, Zhengyao, et al.
Published: (2025) -
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
by: Lv, Zhengyao, et al.
Published: (2024) -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
by: Lv, Zhengyao, et al.
Published: (2025) -
StableWorld: Towards Stable and Consistent Long Interactive Video Generation
by: Yang, Ying, et al.
Published: (2026) -
FreeInit: Bridging Initialization Gap in Video Diffusion Models
by: Wu, Tianxing, et al.
Published: (2023)