Saved in:
Bibliographic Details
Main Authors: Sakano, Kristy, Harrington, Kalonji, Xu, Mumu
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.04327
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917463218716672
author Sakano, Kristy
Harrington, Kalonji
Xu, Mumu
author_facet Sakano, Kristy
Harrington, Kalonji
Xu, Mumu
contents We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences through formal satisfaction metrics embedded into environmental properties and runtime monitoring.
format Preprint
id arxiv_https___arxiv_org_abs_2605_04327
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation
Sakano, Kristy
Harrington, Kalonji
Xu, Mumu
Robotics
We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences through formal satisfaction metrics embedded into environmental properties and runtime monitoring.
title From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation
topic Robotics
url https://arxiv.org/abs/2605.04327