:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Li, Yun, Zhang, Yiming, Lin, Tao, Liu, Xiangrui, Cai, Wenxiao, Liu, Zheng, Zhao, Bo
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2503.23765
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

SpatialBot: Precise Spatial Understanding with Vision Language Models
von: Cai, Wenxiao, et al.
Veröffentlicht: (2024)

RefereeBench: Are Video MLLMs Ready to be Multi-Sport Referees
von: Xu, Yichen, et al.
Veröffentlicht: (2026)

ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
von: Yang, Liu, et al.
Veröffentlicht: (2025)

RoadBench: Benchmarking MLLMs on Fine-Grained Spatial Understanding and Reasoning under Urban Road Scenarios
von: Zhang, Jun, et al.
Veröffentlicht: (2025)

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
von: Lin, Junming, et al.
Veröffentlicht: (2024)

PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
von: Zhang, Zixin, et al.
Veröffentlicht: (2025)

Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
von: Liu, Xiangrui, et al.
Veröffentlicht: (2025)

Spatial Preference Rewarding for MLLMs Spatial Understanding
von: Qiu, Han, et al.
Veröffentlicht: (2025)

OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
von: Lin, Jingli, et al.
Veröffentlicht: (2025)

EventBench: Towards Comprehensive Benchmarking of Event-based MLLMs
von: Liu, Shaoyu, et al.
Veröffentlicht: (2025)

EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
von: Yuan, Yuqian, et al.
Veröffentlicht: (2025)

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning
von: Wu, Tao, et al.
Veröffentlicht: (2025)

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding
von: Sun, Peiwen, et al.
Veröffentlicht: (2026)

Unhackable Temporal Rewarding for Scalable Video MLLMs
von: Yu, En, et al.
Veröffentlicht: (2025)

SPARROW: Learning Spatial Precision and Temporal Referential Consistency in Pixel-Grounded Video MLLMs
von: Alansari, Mohamad, et al.
Veröffentlicht: (2026)

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification
von: Meng, Jiahao, et al.
Veröffentlicht: (2026)

E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs
von: Liu, Xianjie, et al.
Veröffentlicht: (2026)

TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos
von: Liu, Xiangrui, et al.
Veröffentlicht: (2025)

SpatialViz-Bench: A Cognitively-Grounded Benchmark for Diagnosing Spatial Visualization in MLLMs
von: Wang, Siting, et al.
Veröffentlicht: (2025)

Real-World Scene Recovery for Scattering-Degraded Images Using Spatial and Frequency Priors
von: Liu, Yun, et al.
Veröffentlicht: (2025)

3D Spatial Understanding in MLLMs: Disambiguation and Evaluation
von: Chang, Chun-Peng, et al.
Veröffentlicht: (2024)

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?
von: Liu, Yuanxin, et al.
Veröffentlicht: (2025)

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
von: Ouyang, Kun, et al.
Veröffentlicht: (2024)

Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding
von: Lin, Tao, et al.
Veröffentlicht: (2025)

IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting
von: Zhang, Tao, et al.
Veröffentlicht: (2025)

From Indoor to Open World: Revealing the Spatial Reasoning Gap in MLLMs
von: Wu, Mingrui, et al.
Veröffentlicht: (2025)

Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization
von: Xu, Yuanze, et al.
Veröffentlicht: (2025)

SpatialTree: How Spatial Abilities Branch Out in MLLMs
von: Xiao, Yuxi, et al.
Veröffentlicht: (2025)

Improving the Reasoning of Multi-Image Grounding in MLLMs via Reinforcement Learning
von: Zhang, Bob, et al.
Veröffentlicht: (2025)

On the Generalization Capacities of MLLMs for Spatial Intelligence
von: Zhang, Gongjie, et al.
Veröffentlicht: (2026)

Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
von: Liu, Yexin, et al.
Veröffentlicht: (2024)

VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction
von: Wang, Hao, et al.
Veröffentlicht: (2025)

SpaceMind++: Toward Allocentric Cognitive Maps for Spatially Grounded Video MLLMs
von: Gu, Bo, et al.
Veröffentlicht: (2026)

FunBench: Benchmarking Fundus Reading Skills of MLLMs
von: Wei, Qijie, et al.
Veröffentlicht: (2025)

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
von: Ouyang, Kun, et al.
Veröffentlicht: (2025)

Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs
von: Anand, Dhruv, et al.
Veröffentlicht: (2025)

Universal Skeleton Understanding via Differentiable Rendering and MLLMs
von: Wang, Ziyi, et al.
Veröffentlicht: (2026)

Video-XL-2: Towards Very Long-Video Understanding Through Task-Aware KV Sparsification
von: Qin, Minghao, et al.
Veröffentlicht: (2025)

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
von: Jia, Xiaojun, et al.
Veröffentlicht: (2025)

Visual Jigsaw Post-Training Improves MLLMs
von: Wu, Penghao, et al.
Veröffentlicht: (2025)