Saved in:
| Main Authors: | Kong, Fanqi, Zu, Weiqin, Chen, Xinyu, Yang, Yaodong, Zhu, Song-Chun, Feng, Xue |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.05425 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench
by: Hu, Lanxiang, et al.
Published: (2025)
by: Hu, Lanxiang, et al.
Published: (2025)
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025)
by: Deng, Andong, et al.
Published: (2025)
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
by: Tu, Chongjun, et al.
Published: (2025)
by: Tu, Chongjun, et al.
Published: (2025)
ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models
by: Xue, Kaiwen, et al.
Published: (2026)
by: Xue, Kaiwen, et al.
Published: (2026)
VEU-Bench: Towards Comprehensive Understanding of Video Editing
by: Li, Bozheng, et al.
Published: (2025)
by: Li, Bozheng, et al.
Published: (2025)
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
by: Chen, Houlun, et al.
Published: (2024)
by: Chen, Houlun, et al.
Published: (2024)
MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence
by: Lin, Jingli, et al.
Published: (2025)
by: Lin, Jingli, et al.
Published: (2025)
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
by: Feng, Weixi, et al.
Published: (2024)
by: Feng, Weixi, et al.
Published: (2024)
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
by: Cai, Mu, et al.
Published: (2024)
by: Cai, Mu, et al.
Published: (2024)
Video-Bench: Human-Aligned Video Generation Benchmark
by: Han, Hui, et al.
Published: (2025)
by: Han, Hui, et al.
Published: (2025)
EgoPro-Bench: Benchmarking Personalized Proactive Interaction in Egocentric Video Streams
by: Ran, Dongchuan, et al.
Published: (2026)
by: Ran, Dongchuan, et al.
Published: (2026)
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
by: Zhao, Yiming, et al.
Published: (2025)
by: Zhao, Yiming, et al.
Published: (2025)
Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection
by: Yakun, Cui, et al.
Published: (2025)
by: Yakun, Cui, et al.
Published: (2025)
UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images
by: Li, Siqi, et al.
Published: (2025)
by: Li, Siqi, et al.
Published: (2025)
HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
by: Cai, Yuxuan, et al.
Published: (2025)
by: Cai, Yuxuan, et al.
Published: (2025)
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
by: Pan, Yaning, et al.
Published: (2025)
by: Pan, Yaning, et al.
Published: (2025)
CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography
by: Zhu, Qingqing, et al.
Published: (2026)
by: Zhu, Qingqing, et al.
Published: (2026)
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
by: Lin, Junming, et al.
Published: (2024)
by: Lin, Junming, et al.
Published: (2024)
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
by: Wang, Andong, et al.
Published: (2024)
by: Wang, Andong, et al.
Published: (2024)
Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026)
by: Chen, Houlun, et al.
Published: (2026)
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
by: Zhu, Fengbin, et al.
Published: (2024)
by: Zhu, Fengbin, et al.
Published: (2024)
VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis
by: Park, Jinho, et al.
Published: (2026)
by: Park, Jinho, et al.
Published: (2026)
VideoRewardBench: Comprehensive Evaluation of Multimodal Reward Models for Video Understanding
by: Zhang, Zhihong, et al.
Published: (2025)
by: Zhang, Zhihong, et al.
Published: (2025)
UI2V-Bench: An Understanding-based Image-to-video Generation Benchmark
by: Zhang, Ailing, et al.
Published: (2025)
by: Zhang, Ailing, et al.
Published: (2025)
LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding
by: Wang, Xiaodong, et al.
Published: (2026)
by: Wang, Xiaodong, et al.
Published: (2026)
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
by: Luo, Yang, et al.
Published: (2025)
by: Luo, Yang, et al.
Published: (2025)
CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
by: Gan, Rui, et al.
Published: (2026)
by: Gan, Rui, et al.
Published: (2026)
AirCopBench: A Benchmark for Multi-drone Collaborative Embodied Perception and Reasoning
by: Zha, Jirong, et al.
Published: (2025)
by: Zha, Jirong, et al.
Published: (2025)
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding
by: Lozano, Alejandro, et al.
Published: (2024)
by: Lozano, Alejandro, et al.
Published: (2024)
SeqBench: Benchmarking Sequential Narrative Generation in Text-to-Video Models
by: Tang, Zhengxu, et al.
Published: (2025)
by: Tang, Zhengxu, et al.
Published: (2025)
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
by: Xiong, Haomiao, et al.
Published: (2025)
by: Xiong, Haomiao, et al.
Published: (2025)
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs
by: Zhang, Zicheng, et al.
Published: (2024)
by: Zhang, Zicheng, et al.
Published: (2024)
LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks
by: Kong, Fei, et al.
Published: (2025)
by: Kong, Fei, et al.
Published: (2025)
Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration
by: Zhou, Yue, et al.
Published: (2025)
by: Zhou, Yue, et al.
Published: (2025)
CompareBench: A Benchmark for Visual Comparison Reasoning in Vision-Language Models
by: Cai, Jie, et al.
Published: (2025)
by: Cai, Jie, et al.
Published: (2025)
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
by: Xun, Shuhang, et al.
Published: (2025)
by: Xun, Shuhang, et al.
Published: (2025)
SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis
by: Wei, Jianhui, et al.
Published: (2025)
by: Wei, Jianhui, et al.
Published: (2025)
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces
by: Zhao, Baining, et al.
Published: (2025)
by: Zhao, Baining, et al.
Published: (2025)
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
by: Deng, Andong, et al.
Published: (2024)
by: Deng, Andong, et al.
Published: (2024)
Similar Items
-
Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench
by: Hu, Lanxiang, et al.
Published: (2025) -
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
by: Li, Yunxin, et al.
Published: (2024) -
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
by: Deng, Andong, et al.
Published: (2025) -
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
by: Tu, Chongjun, et al.
Published: (2025) -
ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models
by: Xue, Kaiwen, et al.
Published: (2026)