:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Jiang, Yifan, Wang, Yueying, Zhao, Rui, Parag, Toufiq, Chen, Zhimin, Liao, Zhenyu, Unnikrishnan, Jayakrishnan
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2511.11113
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

VeRVE: Versatile Retrieval for Videos via Unified Embeddings
by: Halbe, Shaunak, et al.
Published: (2026)

Modality Agnostic Efficient Long Range Encoder
by: Parag, Toufiq, et al.
Published: (2025)

Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection
by: Yakun, Cui, et al.
Published: (2025)

Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models
by: Agarwal, Sakshi, et al.
Published: (2026)

Improved Visual-Spatial Reasoning via R1-Zero-Like Training
by: Liao, Zhenyi, et al.
Published: (2025)

Semantic-Geometric Dual Compression: Training-Free Visual Token Reduction for Ultra-High-Resolution Remote Sensing Understanding
by: Li, Yueying, et al.
Published: (2026)

Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench
by: Hu, Lanxiang, et al.
Published: (2025)

AD-MIR: Bridging the Gap from Perception to Persuasion in Advertising Video Understanding via Structured Reasoning
by: Xu, Binxiao, et al.
Published: (2026)

Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
by: Chen, Houlun, et al.
Published: (2026)

Personalized Video Summarization by Multimodal Video Understanding
by: Chen, Brian, et al.
Published: (2024)

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
by: Fei, Hao, et al.
Published: (2024)

Abductive Ego-View Accident Video Understanding for Safe Driving Perception
by: Fang, Jianwu, et al.
Published: (2024)

DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning
by: Gao, Yifeng, et al.
Published: (2025)

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
by: Deng, Andong, et al.
Published: (2024)

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
by: Tang, Jiaqi, et al.
Published: (2025)

A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning
by: Jiang, Siyang, et al.
Published: (2025)

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
by: Gan, Rui, et al.
Published: (2026)

SIV-Bench: A Video Benchmark for Social Interaction Understanding and Reasoning
by: Kong, Fanqi, et al.
Published: (2025)

Universal Visuo-Tactile Video Understanding for Embodied Interaction
by: Xie, Yifan, et al.
Published: (2025)

Audio-centric Video Understanding Benchmark without Text Shortcut
by: Yang, Yudong, et al.
Published: (2025)

EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports
by: Ma, Jianzhe, et al.
Published: (2026)

R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation
by: Chen, Zhuohong, et al.
Published: (2026)

Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
by: Zhao, Bingchen, et al.
Published: (2024)

Automated Segmentation of Ischemic Stroke Lesions in Non-Contrast Computed Tomography Images for Enhanced Treatment and Prognosis
by: Musah, Toufiq, et al.
Published: (2024)

VideoVista: A Versatile Benchmark for Video Understanding and Reasoning
by: Li, Yunxin, et al.
Published: (2024)

Em-Garde: A Propose-Match Framework for Proactive Streaming Video Understanding
by: Zheng, Yikai, et al.
Published: (2026)

Place-it-R1: Unlocking Environment-aware Reasoning Potential of MLLM for Video Object Insertion
by: Gu, Bohai, et al.
Published: (2026)

Enhancing Long Video Understanding via Hierarchical Event-Based Memory
by: Cheng, Dingxin, et al.
Published: (2024)

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
by: Schneider, Benjamin, et al.
Published: (2025)

GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery
by: Wang, Fengxiang, et al.
Published: (2026)

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting
by: Lee, Daeun, et al.
Published: (2026)

H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding
by: Wu, Qi, et al.
Published: (2025)

VideoPrism: A Foundational Visual Encoder for Video Understanding
by: Zhao, Long, et al.
Published: (2024)

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
by: Tong, Chengzhuo, et al.
Published: (2026)

Perception-R1: Advancing Multimodal Reasoning Capabilities of MLLMs via Visual Perception Reward
by: Xiao, Tong, et al.
Published: (2025)

Vamos: Versatile Action Models for Video Understanding
by: Wang, Shijie, et al.
Published: (2023)

RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph
by: Malik, Sameer, et al.
Published: (2025)

Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
by: Zhang, Fanrui, et al.
Published: (2025)

TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning
by: Pan, Junwen, et al.
Published: (2025)

R^3-VQA: "Read the Room" by Video Social Reasoning
by: Niu, Lixing, et al.
Published: (2025)