Saved in:
| Main Authors: | Liu, Mingxin, Ma, Shuran, Meng, Shibei, Zhao, Xiangyu, Zhang, Zicheng, Zhang, Shaofeng, Zhong, Zhihang, Chen, Peixian, Cao, Haoyu, Sun, Xing, Duan, Haodong, Yang, Xue |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.05986 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events
by: Liu, Xiaolin, et al.
Published: (2026)
by: Liu, Xiaolin, et al.
Published: (2026)
Streaming Video Instruction Tuning
by: Xia, Jiaer, et al.
Published: (2025)
by: Xia, Jiaer, et al.
Published: (2025)
DreamWorld: Unified World Modeling in Video Generation
by: Tan, Boming, et al.
Published: (2026)
by: Tan, Boming, et al.
Published: (2026)
An Empirical Study on How Video-LLMs Answer Video Questions
by: Gou, Chenhui, et al.
Published: (2025)
by: Gou, Chenhui, et al.
Published: (2025)
VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment
by: Meng, Shibei, et al.
Published: (2026)
by: Meng, Shibei, et al.
Published: (2026)
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
by: Fang, Xinyu, et al.
Published: (2024)
by: Fang, Xinyu, et al.
Published: (2024)
LiveWorld: Simulating Out-of-Sight Dynamics in Generative Video World Models
by: Duan, Zicheng, et al.
Published: (2026)
by: Duan, Zicheng, et al.
Published: (2026)
RISE: Self-Improving Robot Policy with Compositional World Model
by: Yang, Jiazhi, et al.
Published: (2026)
by: Yang, Jiazhi, et al.
Published: (2026)
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models
by: Zhang, Xiangdong, et al.
Published: (2025)
by: Zhang, Xiangdong, et al.
Published: (2025)
Fast Encoding and Decoding for Implicit Video Representation
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
Velocity Disambiguation for Video Frame Interpolation
by: Zhong, Zhihang, et al.
Published: (2023)
by: Zhong, Zhihang, et al.
Published: (2023)
iVideoGPT: Interactive VideoGPTs are Scalable World Models
by: Wu, Jialong, et al.
Published: (2024)
by: Wu, Jialong, et al.
Published: (2024)
Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
by: Li, Yifei, et al.
Published: (2025)
by: Li, Yifei, et al.
Published: (2025)
RISE: Rule-Driven SQL Dialect Translation via Query Reduction
by: Xie, Xudong, et al.
Published: (2026)
by: Xie, Xudong, et al.
Published: (2026)
SWIFT: Prompt-Adaptive Memory for Efficient Interactive Long Video Generation
by: Tan, Shanwen, et al.
Published: (2026)
by: Tan, Shanwen, et al.
Published: (2026)
VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification
by: Meng, Jiahao, et al.
Published: (2026)
by: Meng, Jiahao, et al.
Published: (2026)
Let Your Video Listen to Your Music!
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
Unified Video Action Model
by: Li, Shuang, et al.
Published: (2025)
by: Li, Shuang, et al.
Published: (2025)
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
by: Wang, Jiarui, et al.
Published: (2025)
by: Wang, Jiarui, et al.
Published: (2025)
RISE-T2V: Rephrasing and Injecting Semantics with LLM for Expansive Text-to-Video Generation
by: Zhang, Xiangjun, et al.
Published: (2025)
by: Zhang, Xiangjun, et al.
Published: (2025)
GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing
by: Liu, Mingxin, et al.
Published: (2026)
by: Liu, Mingxin, et al.
Published: (2026)
Graph2Video: Leveraging Video Models to Model Dynamic Graph Evolution
by: Liu, Hua, et al.
Published: (2026)
by: Liu, Hua, et al.
Published: (2026)
VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models
by: Li, Yunhao, et al.
Published: (2026)
by: Li, Yunhao, et al.
Published: (2026)
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
by: Wei, Xilin, et al.
Published: (2025)
by: Wei, Xilin, et al.
Published: (2025)
Dreamitate: Real-World Visuomotor Policy Learning via Video Generation
by: Liang, Junbang, et al.
Published: (2024)
by: Liang, Junbang, et al.
Published: (2024)
Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints
by: Meng, Hao, et al.
Published: (2026)
by: Meng, Hao, et al.
Published: (2026)
GMFlow: Global Motion-Guided Recurrent Flow for 6D Object Pose Estimation
by: Liu, Xin, et al.
Published: (2024)
by: Liu, Xin, et al.
Published: (2024)
Geometry-Aware Implicit Memory for Video World Models
by: Wei, Zhengxuan, et al.
Published: (2026)
by: Wei, Zhengxuan, et al.
Published: (2026)
Pathwise Test-Time Correction for Autoregressive Long Video Generation
by: Xiang, Xunzhi, et al.
Published: (2026)
by: Xiang, Xunzhi, et al.
Published: (2026)
Transformer-based EEG Decoding: A Survey
by: Zhang, Haodong, et al.
Published: (2025)
by: Zhang, Haodong, et al.
Published: (2025)
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum
by: Guo, Zhuoning, et al.
Published: (2025)
by: Guo, Zhuoning, et al.
Published: (2025)
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs
by: Zhu, Xiaorong, et al.
Published: (2025)
by: Zhu, Xiaorong, et al.
Published: (2025)
Redundancy Principles for MLLMs Benchmarks
by: Zhang, Zicheng, et al.
Published: (2025)
by: Zhang, Zicheng, et al.
Published: (2025)
Beyond Video-to-SFX: Video to Audio Synthesis with Environmentally Aware Speech
by: Niu, Xinlei, et al.
Published: (2025)
by: Niu, Xinlei, et al.
Published: (2025)
Neural Video Compression with Domain Transfer
by: Zhang, Tiange, et al.
Published: (2026)
by: Zhang, Tiange, et al.
Published: (2026)
LightMotion: A Light and Tuning-free Method for Simulating Camera Motion in Video Generation
by: Song, Quanjian, et al.
Published: (2025)
by: Song, Quanjian, et al.
Published: (2025)
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
by: Huang, Haojian, et al.
Published: (2025)
by: Huang, Haojian, et al.
Published: (2025)
SSNVC: Single Stream Neural Video Compression with Implicit Temporal Information
by: Wang, Feng, et al.
Published: (2024)
by: Wang, Feng, et al.
Published: (2024)
Fast Autoregressive Video Generation with Diagonal Decoding
by: Ye, Yang, et al.
Published: (2025)
by: Ye, Yang, et al.
Published: (2025)
Similar Items
-
Moment-Video: Diagnosing Temporal Fidelity of Video MLLMs on Momentary Visual Events
by: Liu, Xiaolin, et al.
Published: (2026) -
Streaming Video Instruction Tuning
by: Xia, Jiaer, et al.
Published: (2025) -
DreamWorld: Unified World Modeling in Video Generation
by: Tan, Boming, et al.
Published: (2026) -
An Empirical Study on How Video-LLMs Answer Video Questions
by: Gou, Chenhui, et al.
Published: (2025) -
VersusQ: Pairwise Margin Reasoning for Generalizable Video Quality Assessment
by: Meng, Shibei, et al.
Published: (2026)