Saved in:
| Main Authors: | Gao, Hong, Wu, Jingyu, Xu, Xiangkai, Xie, Kangni, Zhang, Yunchen, Zhong, Bin, Gao, Xurui, Zhang, Min-Ling |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.16937 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025)
by: Yao, Jiali, et al.
Published: (2025)
RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios
by: Zhao, Tianyi, et al.
Published: (2025)
by: Zhao, Tianyi, et al.
Published: (2025)
Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
by: Tu, Xuezhen, et al.
Published: (2026)
by: Tu, Xuezhen, et al.
Published: (2026)
DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios
by: Gao, Zeyu, et al.
Published: (2025)
by: Gao, Zeyu, et al.
Published: (2025)
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
by: Huang, Shijue, et al.
Published: (2024)
by: Huang, Shijue, et al.
Published: (2024)
Context-Guided Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2024)
by: Gu, Xin, et al.
Published: (2024)
MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description
by: Mo, Jiawei, et al.
Published: (2024)
by: Mo, Jiawei, et al.
Published: (2024)
Towards Long-Form Spatio-Temporal Video Grounding
by: Gu, Xin, et al.
Published: (2026)
by: Gu, Xin, et al.
Published: (2026)
AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
by: Hou, Yunhao, et al.
Published: (2025)
by: Hou, Yunhao, et al.
Published: (2025)
ToolOmni: Enabling Open-World Tool Use via Agentic learning with Proactive Retrieval and Grounded Execution
by: Huang, Shouzheng, et al.
Published: (2026)
by: Huang, Shouzheng, et al.
Published: (2026)
Detector-Empowered Video Large Language Model for Efficient Spatio-Temporal Grounding
by: Gao, Shida, et al.
Published: (2025)
by: Gao, Shida, et al.
Published: (2025)
Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning
by: Sugandhika, Chinthani, et al.
Published: (2025)
by: Sugandhika, Chinthani, et al.
Published: (2025)
RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation
by: Zhang, Ruoxuan, et al.
Published: (2025)
by: Zhang, Ruoxuan, et al.
Published: (2025)
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
by: Wang, Jiankang, et al.
Published: (2025)
by: Wang, Jiankang, et al.
Published: (2025)
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
by: Wasim, Syed Talal, et al.
Published: (2023)
by: Wasim, Syed Talal, et al.
Published: (2023)
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
by: Meng, Jinxiang, et al.
Published: (2026)
by: Meng, Jinxiang, et al.
Published: (2026)
MolGround: A Benchmark for Molecular Grounding
by: Wu, Jiaxin, et al.
Published: (2025)
by: Wu, Jiaxin, et al.
Published: (2025)
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer
by: Peng, Haosong, et al.
Published: (2025)
by: Peng, Haosong, et al.
Published: (2025)
CaST-Bench: Benchmarking Causal Chain-Grounded Spatio-Temporal Reasoning for Video Question Answering
by: Zhang, Mingfang, et al.
Published: (2026)
by: Zhang, Mingfang, et al.
Published: (2026)
Agentic Spatio-Temporal Grounding via Collaborative Reasoning
by: Zhao, Heng, et al.
Published: (2026)
by: Zhao, Heng, et al.
Published: (2026)
VideoMolmo: Spatio-Temporal Grounding Meets Pointing
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)
by: Ahmad, Ghazi Shazan, et al.
Published: (2025)
OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding
by: Zheng, Minghang, et al.
Published: (2026)
by: Zheng, Minghang, et al.
Published: (2026)
OmniACBench: A Benchmark for Evaluating Context-Grounded Acoustic Control in Omni-Modal Models
by: Kim, Seunghee, et al.
Published: (2026)
by: Kim, Seunghee, et al.
Published: (2026)
Beyond Referring Expressions: Scenario Comprehension Visual Grounding
by: He, Ruozhen, et al.
Published: (2026)
by: He, Ruozhen, et al.
Published: (2026)
SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving
by: Li, Jingyu, et al.
Published: (2026)
by: Li, Jingyu, et al.
Published: (2026)
Paladin-mini: A Compact and Efficient Grounding Model Excelling in Real-World Scenarios
by: Ivry, Dror, et al.
Published: (2025)
by: Ivry, Dror, et al.
Published: (2025)
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
by: Zhuang, Weijun, et al.
Published: (2025)
by: Zhuang, Weijun, et al.
Published: (2025)
ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios
by: Pan, Changzai, et al.
Published: (2026)
by: Pan, Changzai, et al.
Published: (2026)
OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning
by: Kim, Seunghee, et al.
Published: (2025)
by: Kim, Seunghee, et al.
Published: (2025)
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
by: Liu, Xiaoyang, et al.
Published: (2024)
by: Liu, Xiaoyang, et al.
Published: (2024)
ToG-Bench: Task-Oriented Spatio-Temporal Grounding in Egocentric Videos
by: Xu, Qi'ao, et al.
Published: (2025)
by: Xu, Qi'ao, et al.
Published: (2025)
WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis
by: Lu, Shuo, et al.
Published: (2026)
by: Lu, Shuo, et al.
Published: (2026)
A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset
by: Jian, Muwei, et al.
Published: (2024)
by: Jian, Muwei, et al.
Published: (2024)
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
by: Garg, Aaryan, et al.
Published: (2025)
by: Garg, Aaryan, et al.
Published: (2025)
Thinking With Bounding Boxes: Enhancing Spatio-Temporal Video Grounding via Reinforcement Fine-Tuning
by: Gu, Xin, et al.
Published: (2025)
by: Gu, Xin, et al.
Published: (2025)
OmniCompliance-100K: A Multi-Domain, Rule-Grounded, Real-World Safety Compliance Dataset
by: Hu, Wenbin, et al.
Published: (2026)
by: Hu, Wenbin, et al.
Published: (2026)
Grounding World Simulation Models in a Real-World Metropolis
by: Seo, Junyoung, et al.
Published: (2026)
by: Seo, Junyoung, et al.
Published: (2026)
Towards Robust Sensor-Fusion Ground SLAM: A Comprehensive Benchmark and A Resilient Framework
by: Zhang, Deteng, et al.
Published: (2025)
by: Zhang, Deteng, et al.
Published: (2025)
MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
by: Li, Zhang, et al.
Published: (2026)
by: Li, Zhang, et al.
Published: (2026)
ShoppingBench: A Real-World Intent-Grounded Shopping Benchmark for LLM-based Agents
by: Wang, Jiangyuan, et al.
Published: (2025)
by: Wang, Jiangyuan, et al.
Published: (2025)
Similar Items
-
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
by: Yao, Jiali, et al.
Published: (2025) -
RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios
by: Zhao, Tianyi, et al.
Published: (2025) -
Bridging Time and Space: Decoupled Spatio-Temporal Alignment for Video Grounding
by: Tu, Xuezhen, et al.
Published: (2026) -
DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios
by: Gao, Zeyu, et al.
Published: (2025) -
Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
by: Huang, Shijue, et al.
Published: (2024)