Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.04327 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866917463218716672 |
|---|---|
| author | Sakano, Kristy Harrington, Kalonji Xu, Mumu |
| author_facet | Sakano, Kristy Harrington, Kalonji Xu, Mumu |
| contents | We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences through formal satisfaction metrics embedded into environmental properties and runtime monitoring. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_04327 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation Sakano, Kristy Harrington, Kalonji Xu, Mumu Robotics We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences through formal satisfaction metrics embedded into environmental properties and runtime monitoring. |
| title | From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation |
| topic | Robotics |
| url | https://arxiv.org/abs/2605.04327 |