Saved in:
| Main Authors: | Popov, Maxim, Kurkova, Regina, Iumanov, Mikhail, Mahmoud, Jaafar, Kolyubin, Sergey |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.10331 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes
by: Kurkova, Regina, et al.
Published: (2026)
by: Kurkova, Regina, et al.
Published: (2026)
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
by: Nasser, Zaid, et al.
Published: (2026)
by: Nasser, Zaid, et al.
Published: (2026)
KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM
by: Nasser, Zaid, et al.
Published: (2025)
by: Nasser, Zaid, et al.
Published: (2025)
AgentGrounder: Zero-Shot 3D Visual Pointcloud Grounding using Multimodal Language Models
by: Huynh, Cuong, et al.
Published: (2026)
by: Huynh, Cuong, et al.
Published: (2026)
R5DGS: Semantic-Aware 4D Gaussian Splatting with Rigid Body Constraints for Efficient Dynamic Scene Reconstruction
by: Gridusov, Denis, et al.
Published: (2026)
by: Gridusov, Denis, et al.
Published: (2026)
EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving
by: Schäfer, Finn Rasmus, et al.
Published: (2026)
by: Schäfer, Finn Rasmus, et al.
Published: (2026)
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
by: Feng, Di, et al.
Published: (2025)
by: Feng, Di, et al.
Published: (2025)
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization
by: Sidorov, Gennady, et al.
Published: (2024)
by: Sidorov, Gennady, et al.
Published: (2024)
RoboTrustBench: Benchmarking the Trustworthiness of Video World Models for Robotic Manipulation
by: Li, Huiqiong, et al.
Published: (2026)
by: Li, Huiqiong, et al.
Published: (2026)
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
by: Chen, Yi, et al.
Published: (2023)
by: Chen, Yi, et al.
Published: (2023)
Open-Vocabulary Online Semantic Mapping for SLAM
by: Martins, Tomas Berriel, et al.
Published: (2024)
by: Martins, Tomas Berriel, et al.
Published: (2024)
Bridging the 2D-3D Gap: A Hierarchical Semantic-Geometric Map for Vision Language Navigation
by: Li, Kailing, et al.
Published: (2026)
by: Li, Kailing, et al.
Published: (2026)
Virtual Community: An Open World for Humans, Robots, and Society
by: Zhou, Qinhong, et al.
Published: (2025)
by: Zhou, Qinhong, et al.
Published: (2025)
Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
by: Guillen-Perez, Antonio
Published: (2025)
by: Guillen-Perez, Antonio
Published: (2025)
DISC: Dense Integrated Semantic Context for Large-Scale Open-Set Semantic Mapping
by: Igelbrink, Felix, et al.
Published: (2026)
by: Igelbrink, Felix, et al.
Published: (2026)
Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
by: Yashima, Daichi, et al.
Published: (2024)
by: Yashima, Daichi, et al.
Published: (2024)
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
by: Lu, Xiaoya, et al.
Published: (2025)
by: Lu, Xiaoya, et al.
Published: (2025)
LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping
by: Singh, Kurran, et al.
Published: (2024)
by: Singh, Kurran, et al.
Published: (2024)
DivScene: Towards Open-Vocabulary Object Navigation with Large Vision Language Models in Diverse Scenes
by: Wang, Zhaowei, et al.
Published: (2024)
by: Wang, Zhaowei, et al.
Published: (2024)
DM2RM: Dual-Mode Multimodal Ranking for Target Objects and Receptacles Based on Open-Vocabulary Instructions
by: Korekata, Ryosuke, et al.
Published: (2024)
by: Korekata, Ryosuke, et al.
Published: (2024)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models
by: Wang, Tianyu, et al.
Published: (2024)
by: Wang, Tianyu, et al.
Published: (2024)
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
by: Jiang, Jiajun, et al.
Published: (2025)
by: Jiang, Jiajun, et al.
Published: (2025)
Semantic-Aware Guided Drone Exploration for Language-Conditioned 3D Indoor Mapping
by: Vegesna, Nitin, et al.
Published: (2026)
by: Vegesna, Nitin, et al.
Published: (2026)
Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery
by: Ma, Boyi, et al.
Published: (2025)
by: Ma, Boyi, et al.
Published: (2025)
LINGO-Space: Language-Conditioned Incremental Grounding for Space
by: Kim, Dohyun, et al.
Published: (2024)
by: Kim, Dohyun, et al.
Published: (2024)
Ensemble-Based Event Camera Place Recognition Under Varying Illumination
by: Joseph, Therese, et al.
Published: (2025)
by: Joseph, Therese, et al.
Published: (2025)
Break Out the Silverware -- Semantic Understanding of Stored Household Items
by: Levi-Richter, Michaela, et al.
Published: (2025)
by: Levi-Richter, Michaela, et al.
Published: (2025)
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
by: Salzmann, Tim, et al.
Published: (2024)
by: Salzmann, Tim, et al.
Published: (2024)
Context-based Motion Retrieval using Open Vocabulary Methods for Autonomous Driving
by: Englmeier, Stefan, et al.
Published: (2025)
by: Englmeier, Stefan, et al.
Published: (2025)
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
by: Nanwani, Laksh, et al.
Published: (2024)
by: Nanwani, Laksh, et al.
Published: (2024)
GraphEQA: Using 3D Semantic Scene Graphs for Real-time Embodied Question Answering
by: Saxena, Saumya, et al.
Published: (2024)
by: Saxena, Saumya, et al.
Published: (2024)
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
by: Zhang, Shiduo, et al.
Published: (2024)
by: Zhang, Shiduo, et al.
Published: (2024)
CADENet: Condition-Adaptive Asynchronous Dual-Stream Enhancement Network for Adverse Weather Perception in Autonomous Driving
by: Khairy, Sherif, et al.
Published: (2026)
by: Khairy, Sherif, et al.
Published: (2026)
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
by: Chow, Wei, et al.
Published: (2025)
by: Chow, Wei, et al.
Published: (2025)
ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop
by: Hong, Yining, et al.
Published: (2026)
by: Hong, Yining, et al.
Published: (2026)
Vision based Crop Row Navigation under Varying Field Conditions in Arable Fields
by: de Silva, Rajitha, et al.
Published: (2022)
by: de Silva, Rajitha, et al.
Published: (2022)
Open-Set Semantic Uncertainty Aware Metric-Semantic Graph Matching
by: Singh, Kurran, et al.
Published: (2024)
by: Singh, Kurran, et al.
Published: (2024)
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
by: Kong, Lingdong, et al.
Published: (2024)
by: Kong, Lingdong, et al.
Published: (2024)
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
by: Wang, Qineng, et al.
Published: (2025)
by: Wang, Qineng, et al.
Published: (2025)
Monocular Localization with Semantics Map for Autonomous Vehicles
by: Wan, Jixiang, et al.
Published: (2024)
by: Wan, Jixiang, et al.
Published: (2024)
Similar Items
-
OSMa-Bench++: Toward Open-Ended Benchmarking of Semantic Mapping for Manipulation with Prompt-Generated Synthetic Scenes
by: Kurkova, Regina, et al.
Published: (2026) -
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
by: Nasser, Zaid, et al.
Published: (2026) -
KM-ViPE: Online Tightly Coupled Vision-Language-Geometry Fusion for Open-Vocabulary Semantic SLAM
by: Nasser, Zaid, et al.
Published: (2025) -
AgentGrounder: Zero-Shot 3D Visual Pointcloud Grounding using Multimodal Language Models
by: Huynh, Cuong, et al.
Published: (2026) -
R5DGS: Semantic-Aware 4D Gaussian Splatting with Rigid Body Constraints for Efficient Dynamic Scene Reconstruction
by: Gridusov, Denis, et al.
Published: (2026)