Saved in:
Bibliographic Details
Main Authors: Christensen, Kim Alexander, Tufte, Andreas Gudahl, Gusev, Alexey, Sinha, Rohan, Ganai, Milan, Alsos, Ole Andreas, Pavone, Marco, Steinert, Martin
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.24470
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911695463514112
author Christensen, Kim Alexander
Tufte, Andreas Gudahl
Gusev, Alexey
Sinha, Rohan
Ganai, Milan
Alsos, Ole Andreas
Pavone, Marco
Steinert, Martin
author_facet Christensen, Kim Alexander
Tufte, Andreas Gudahl
Gusev, Alexey
Sinha, Rohan
Ganai, Milan
Alsos, Ole Andreas
Pavone, Marco
Steinert, Martin
contents The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the operator, permit immediate human override, and avoid changing the voyage plan without approval. Meeting these obligations in the alert-to-takeover gap calls for a short-horizon, human-overridable fallback maneuver. Classical maritime autonomy stacks struggle when the correct action depends on meaning (e.g., diver-down flag means people in the water, fire close by means hazard). We argue (i) that vision-language models (VLMs) provide semantic awareness for such out-of-distribution situations, and (ii) that a fast-slow anomaly pipeline with a short-horizon, human-overridable fallback maneuver makes this practical in the handover window. We introduce Semantic Lookout, a camera-only, candidate-constrained VLM fallback maneuver selector that selects one cautious action (or station-keeping) from water-valid, world-anchored trajectories under continuous human authority. On 40 harbor scenes we measure per-call scene understanding and latency, alignment with human consensus (model majority-of-three voting), short-horizon risk-relief on fire hazard scenes, and an on-water alert->fallback maneuver->operator handover. Sub-10 s models retain most of the awareness of slower state-of-the-art models. The fallback maneuver selector outperforms geometry-only baselines and increases standoff distance on fire scenes. A field run verifies end-to-end operation. These results support VLMs as semantic fallback maneuver selectors compatible with the draft IMO MASS Code, within practical latency budgets, and motivate future work on domain-adapted, hybrid autonomy that pairs foundation-model semantics with multi-sensor bird's-eye-view perception and short-horizon replanning. Website: kimachristensen.github.io/bridge_policy
format Preprint
id arxiv_https___arxiv_org_abs_2512_24470
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models
Christensen, Kim Alexander
Tufte, Andreas Gudahl
Gusev, Alexey
Sinha, Rohan
Ganai, Milan
Alsos, Ole Andreas
Pavone, Marco
Steinert, Martin
Robotics
Artificial Intelligence
The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the operator, permit immediate human override, and avoid changing the voyage plan without approval. Meeting these obligations in the alert-to-takeover gap calls for a short-horizon, human-overridable fallback maneuver. Classical maritime autonomy stacks struggle when the correct action depends on meaning (e.g., diver-down flag means people in the water, fire close by means hazard). We argue (i) that vision-language models (VLMs) provide semantic awareness for such out-of-distribution situations, and (ii) that a fast-slow anomaly pipeline with a short-horizon, human-overridable fallback maneuver makes this practical in the handover window. We introduce Semantic Lookout, a camera-only, candidate-constrained VLM fallback maneuver selector that selects one cautious action (or station-keeping) from water-valid, world-anchored trajectories under continuous human authority. On 40 harbor scenes we measure per-call scene understanding and latency, alignment with human consensus (model majority-of-three voting), short-horizon risk-relief on fire hazard scenes, and an on-water alert->fallback maneuver->operator handover. Sub-10 s models retain most of the awareness of slower state-of-the-art models. The fallback maneuver selector outperforms geometry-only baselines and increases standoff distance on fire scenes. A field run verifies end-to-end operation. These results support VLMs as semantic fallback maneuver selectors compatible with the draft IMO MASS Code, within practical latency budgets, and motivate future work on domain-adapted, hybrid autonomy that pairs foundation-model semantics with multi-sensor bird's-eye-view perception and short-horizon replanning. Website: kimachristensen.github.io/bridge_policy
title Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models
topic Robotics
Artificial Intelligence
url https://arxiv.org/abs/2512.24470