Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Christensen, Kim Alexander, Tufte, Andreas Gudahl, Gusev, Alexey, Sinha, Rohan, Ganai, Milan, Alsos, Ole Andreas, Pavone, Marco, Steinert, Martin
Format:	Preprint
Published:	2025
Subjects:	Robotics Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.24470
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911695463514112
author	Christensen, Kim Alexander Tufte, Andreas Gudahl Gusev, Alexey Sinha, Rohan Ganai, Milan Alsos, Ole Andreas Pavone, Marco Steinert, Martin
author_facet	Christensen, Kim Alexander Tufte, Andreas Gudahl Gusev, Alexey Sinha, Rohan Ganai, Milan Alsos, Ole Andreas Pavone, Marco Steinert, Martin
contents	The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the operator, permit immediate human override, and avoid changing the voyage plan without approval. Meeting these obligations in the alert-to-takeover gap calls for a short-horizon, human-overridable fallback maneuver. Classical maritime autonomy stacks struggle when the correct action depends on meaning (e.g., diver-down flag means people in the water, fire close by means hazard). We argue (i) that vision-language models (VLMs) provide semantic awareness for such out-of-distribution situations, and (ii) that a fast-slow anomaly pipeline with a short-horizon, human-overridable fallback maneuver makes this practical in the handover window. We introduce Semantic Lookout, a camera-only, candidate-constrained VLM fallback maneuver selector that selects one cautious action (or station-keeping) from water-valid, world-anchored trajectories under continuous human authority. On 40 harbor scenes we measure per-call scene understanding and latency, alignment with human consensus (model majority-of-three voting), short-horizon risk-relief on fire hazard scenes, and an on-water alert->fallback maneuver->operator handover. Sub-10 s models retain most of the awareness of slower state-of-the-art models. The fallback maneuver selector outperforms geometry-only baselines and increases standoff distance on fire scenes. A field run verifies end-to-end operation. These results support VLMs as semantic fallback maneuver selectors compatible with the draft IMO MASS Code, within practical latency budgets, and motivate future work on domain-adapted, hybrid autonomy that pairs foundation-model semantics with multi-sensor bird's-eye-view perception and short-horizon replanning. Website: kimachristensen.github.io/bridge_policy
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_24470
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models Christensen, Kim Alexander Tufte, Andreas Gudahl Gusev, Alexey Sinha, Rohan Ganai, Milan Alsos, Ole Andreas Pavone, Marco Steinert, Martin Robotics Artificial Intelligence The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the operator, permit immediate human override, and avoid changing the voyage plan without approval. Meeting these obligations in the alert-to-takeover gap calls for a short-horizon, human-overridable fallback maneuver. Classical maritime autonomy stacks struggle when the correct action depends on meaning (e.g., diver-down flag means people in the water, fire close by means hazard). We argue (i) that vision-language models (VLMs) provide semantic awareness for such out-of-distribution situations, and (ii) that a fast-slow anomaly pipeline with a short-horizon, human-overridable fallback maneuver makes this practical in the handover window. We introduce Semantic Lookout, a camera-only, candidate-constrained VLM fallback maneuver selector that selects one cautious action (or station-keeping) from water-valid, world-anchored trajectories under continuous human authority. On 40 harbor scenes we measure per-call scene understanding and latency, alignment with human consensus (model majority-of-three voting), short-horizon risk-relief on fire hazard scenes, and an on-water alert->fallback maneuver->operator handover. Sub-10 s models retain most of the awareness of slower state-of-the-art models. The fallback maneuver selector outperforms geometry-only baselines and increases standoff distance on fire scenes. A field run verifies end-to-end operation. These results support VLMs as semantic fallback maneuver selectors compatible with the draft IMO MASS Code, within practical latency budgets, and motivate future work on domain-adapted, hybrid autonomy that pairs foundation-model semantics with multi-sensor bird's-eye-view perception and short-horizon replanning. Website: kimachristensen.github.io/bridge_policy
title	Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models
topic	Robotics Artificial Intelligence
url	https://arxiv.org/abs/2512.24470

Similar Items