:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Si, Chenyang, Fan, Weichen, Lv, Zhengyao, Huang, Ziqi, Qiao, Yu, Liu, Ziwei
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2501.08994
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
by: Lv, Zhengyao, et al.
Published: (2025)

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
by: Lv, Zhengyao, et al.
Published: (2024)

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
by: Lv, Zhengyao, et al.
Published: (2025)

StableWorld: Towards Stable and Consistent Long Interactive Video Generation
by: Yang, Ying, et al.
Published: (2026)

FreeInit: Bridging Initialization Gap in Video Diffusion Models
by: Wu, Tianxing, et al.
Published: (2023)

LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation
by: Gao, Jianxiong, et al.
Published: (2025)

Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation
by: Chen, Gordon, et al.
Published: (2026)

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
by: Huang, Ziqi, et al.
Published: (2024)

NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing
by: Pan, Tianlin, et al.
Published: (2026)

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
by: Fan, Weichen, et al.
Published: (2025)

RealDPO: Real or Not Real, that is the Preference
by: Cheng, Guo, et al.
Published: (2025)

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
by: Zhang, Fan, et al.
Published: (2024)

LongVie 2: Multimodal Controllable Ultra-Long Video World Model
by: Gao, Jianxiong, et al.
Published: (2025)

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
by: Zheng, Dian, et al.
Published: (2025)

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT
by: Liu, Dongyang, et al.
Published: (2025)

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
by: Fan, Weichen, et al.
Published: (2025)

DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation
by: Yang, Ying, et al.
Published: (2025)

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
by: Cao, Yukang, et al.
Published: (2025)

DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution
by: Lv, Zhengyao, et al.
Published: (2026)

VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment
by: Meng, Shibei, et al.
Published: (2026)

RepNet-VSR: Reparameterizable Architecture for High-Fidelity Video Super-Resolution
by: Wu, Biao, et al.
Published: (2025)

Latte: Latent Diffusion Transformer for Video Generation
by: Ma, Xin, et al.
Published: (2024)

Rethinking Reward Signals in Video GRPO: When Scores Become Targets
by: Li, Rui, et al.
Published: (2025)

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
by: Gu, Zekai, et al.
Published: (2025)

V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
by: Cheng, Zixu, et al.
Published: (2025)

HoLa: B-Rep Generation using a Holistic Latent Representation
by: Liu, Yilin, et al.
Published: (2025)

MR. Video: "MapReduce" is the Principle for Long Video Understanding
by: Pang, Ziqi, et al.
Published: (2025)

DenoiseRep: Denoising Model for Representation Learning
by: Xu, Zhengrui, et al.
Published: (2024)

Demystifying Video Reasoning
by: Wang, Ruisi, et al.
Published: (2026)

Stencil: Subject-Driven Generation with Context Guidance
by: Chen, Gordon, et al.
Published: (2025)

STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models
by: Fan, Linfeng, et al.
Published: (2026)

CineScale: Free Lunch in High-Resolution Cinematic Visual Generation
by: Qiu, Haonan, et al.
Published: (2025)

Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization
by: Ju, Hao, et al.
Published: (2024)

Lighting-grounded Video Generation with Renderer-based Agent Reasoning
by: Cai, Ziqi, et al.
Published: (2026)

Cut2Next: Generating Next Shot via In-Context Tuning
by: He, Jingwen, et al.
Published: (2025)

CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
by: Fan, Weichen, et al.
Published: (2025)

VEnhancer: Generative Space-Time Enhancement for Video Generation
by: He, Jingwen, et al.
Published: (2024)

GeoVideo: Introducing Geometric Regularization into Video Generation Model
by: Bai, Yunpeng, et al.
Published: (2025)

CoS: Chain-of-Shot Prompting for Long Video Understanding
by: Hu, Jian, et al.
Published: (2025)

Towards Language-Driven Video Inpainting via Multimodal Large Language Models
by: Wu, Jianzong, et al.
Published: (2024)