:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Mingxin, Ma, Shuran, Meng, Shibei, Zhao, Xiangyu, Zhang, Zicheng, Zhang, Shaofeng, Zhong, Zhihang, Chen, Peixian, Cao, Haoyu, Sun, Xing, Duan, Haodong, Yang, Xue
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.05986
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events
by: Liu, Xiaolin, et al.
Published: (2026)

Streaming Video Instruction Tuning
by: Xia, Jiaer, et al.
Published: (2025)

DreamWorld: Unified World Modeling in Video Generation
by: Tan, Boming, et al.
Published: (2026)

An Empirical Study on How Video-LLMs Answer Video Questions
by: Gou, Chenhui, et al.
Published: (2025)

VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment
by: Meng, Shibei, et al.
Published: (2026)

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)

LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
by: Duan, Zicheng, et al.
Published: (2026)

RISE: Self-Improving Robot Policy with Compositional World Model
by: Yang, Jiazhi, et al.
Published: (2026)

VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
by: Zhang, Xiangdong, et al.
Published: (2025)

Fast Encoding and Decoding for Implicit Video Representation
by: Chen, Hao, et al.
Published: (2024)

Velocity Disambiguation for Video Frame Interpolation
by: Zhong, Zhihang, et al.
Published: (2023)

iVideoGPT: Interactive VideoGPTs are Scalable World Models
by: Wu, Jialong, et al.
Published: (2024)

Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss
by: Zhang, Xinyu, et al.
Published: (2025)

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
by: Li, Yifei, et al.
Published: (2025)

RISE: Rule-Driven SQL Dialect Translation via Query Reduction
by: Xie, Xudong, et al.
Published: (2026)

SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
by: Tan, Shanwen, et al.
Published: (2026)

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification
by: Meng, Jiahao, et al.
Published: (2026)

Let Your Video Listen to Your Music!
by: Zhang, Xinyu, et al.
Published: (2025)

Unified Video Action Model
by: Li, Shuang, et al.
Published: (2025)

LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
by: Wang, Jiarui, et al.
Published: (2025)

RISE-T2V: Rephrasing and Injecting Semantics with LLM for Expansive Text-to-Video Generation
by: Zhang, Xiangjun, et al.
Published: (2025)

GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing
by: Liu, Mingxin, et al.
Published: (2026)

Graph2Video: Leveraging Video Models to Model Dynamic Graph Evolution
by: Liu, Hua, et al.
Published: (2026)

VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)

VideoRoPE: What Makes for Good Video Rotary Position Embedding?
by: Wei, Xilin, et al.
Published: (2025)

Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
by: Liang, Junbang, et al.
Published: (2024)

Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints
by: Meng, Hao, et al.
Published: (2026)

GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation
by: Liu, Xin, et al.
Published: (2024)

Geometry-Aware Implicit Memory for Video World Models
by: Wei, Zhengxuan, et al.
Published: (2026)

Pathwise Test-Time Correction for Autoregressive Long Video Generation
by: Xiang, Xunzhi, et al.
Published: (2026)

Transformer-based EEG Decoding: A Survey
by: Zhang, Haodong, et al.
Published: (2025)

Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
by: Guo, Zhuoning, et al.
Published: (2025)

GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs
by: Zhu, Xiaorong, et al.
Published: (2025)

Redundancy Principles for MLLMs Benchmarks
by: Zhang, Zicheng, et al.
Published: (2025)

Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
by: Niu, Xinlei, et al.
Published: (2025)

Neural Video Compression with Domain Transfer
by: Zhang, Tiange, et al.
Published: (2026)

LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation
by: Song, Quanjian, et al.
Published: (2025)

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
by: Huang, Haojian, et al.
Published: (2025)

SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information
by: Wang, Feng, et al.
Published: (2024)

Fast Autoregressive Video Generation with Diagonal Decoding
by: Ye, Yang, et al.
Published: (2025)