:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Popov, Maxim, Kurkova, Regina, Iumanov, Mikhail, Mahmoud, Jaafar, Kolyubin, Sergey
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Computation and Language Robotics
Online Access:	https://arxiv.org/abs/2503.10331
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes
by: Kurkova, Regina, et al.
Published: (2026)

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
by: Nasser, Zaid, et al.
Published: (2026)

KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM
by: Nasser, Zaid, et al.
Published: (2025)

AgentGrounder: Zero-Shot 3D Visual Pointcloud Grounding using Multimodal Language Models
by: Huynh, Cuong, et al.
Published: (2026)

R5DGS: Semantic-Aware 4D Gaussian Splatting with Rigid Body Constraints for Efficient Dynamic Scene Reconstruction
by: Gridusov, Denis, et al.
Published: (2026)

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
by: Schäfer, Finn Rasmus, et al.
Published: (2026)

SO-Bench: A Structural Output Evaluation of Multimodal LLMs
by: Feng, Di, et al.
Published: (2025)

GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization
by: Sidorov, Gennady, et al.
Published: (2024)

RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation
by: Li, Huiqiong, et al.
Published: (2026)

EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
by: Chen, Yi, et al.
Published: (2023)

Open-Vocabulary Online Semantic Mapping for SLAM
by: Martins, Tomas Berriel, et al.
Published: (2024)

Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation
by: Li, Kailing, et al.
Published: (2026)

Virtual Community: An Open World for Humans, Robots, and Society
by: Zhou, Qinhong, et al.
Published: (2025)

Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
by: Guillen-Perez, Antonio
Published: (2025)

DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping
by: Igelbrink, Felix, et al.
Published: (2026)

Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024)

IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
by: Lu, Xiaoya, et al.
Published: (2025)

LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping
by: Singh, Kurran, et al.
Published: (2024)

DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)

DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024)

Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
by: Wang, Tianyu, et al.
Published: (2024)

DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
by: Jiang, Jiajun, et al.
Published: (2025)

Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping
by: Vegesna, Nitin, et al.
Published: (2026)

Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery
by: Ma, Boyi, et al.
Published: (2025)

LINGO-Space: Language-Conditioned Incremental Grounding for Space
by: Kim, Dohyun, et al.
Published: (2024)

Ensemble-Based Event Camera Place Recognition Under Varying Illumination
by: Joseph, Therese, et al.
Published: (2025)

Break Out the Silverware -- Semantic Understanding of Stored Household Items
by: Levi-Richter, Michaela, et al.
Published: (2025)

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
by: Salzmann, Tim, et al.
Published: (2024)

Context-based Motion Retrieval using Open Vocabulary Methods for Autonomous Driving
by: Englmeier, Stefan, et al.
Published: (2025)

Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
by: Nanwani, Laksh, et al.
Published: (2024)

GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering
by: Saxena, Saumya, et al.
Published: (2024)

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
by: Zhang, Shiduo, et al.
Published: (2024)

CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving
by: Khairy, Sherif, et al.
Published: (2026)

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
by: Chow, Wei, et al.
Published: (2025)

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop
by: Hong, Yining, et al.
Published: (2026)

Vision based Crop Row Navigation under Varying Field Conditions in Arable Fields
by: de Silva, Rajitha, et al.
Published: (2022)

Open-Set Semantic Uncertainty Aware Metric-Semantic Graph Matching
by: Singh, Kurran, et al.
Published: (2024)

OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
by: Kong, Lingdong, et al.
Published: (2024)

ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
by: Wang, Qineng, et al.
Published: (2025)

Monocular Localization with Semantics Map for Autonomous Vehicles
by: Wan, Jixiang, et al.
Published: (2024)