Saved in:
Bibliographic Details
Main Authors: Yuan, Mingfeng, Zhang, Hao, Mohammadi, Mahan, Li, Runhao, Shan, Jinjun, Waslander, Steven L.
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.09255
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910019694362624
author Yuan, Mingfeng
Zhang, Hao
Mohammadi, Mahan
Li, Runhao
Shan, Jinjun
Waslander, Steven L.
author_facet Yuan, Mingfeng
Zhang, Hao
Mohammadi, Mahan
Li, Runhao
Shan, Jinjun
Waslander, Steven L.
contents Mobile robots are often deployed over long durations in diverse open, dynamic scenes, including indoor setting such as warehouses and manufacturing facilities, and outdoor settings such as agricultural and roadway operations. A core challenge is to build a scalable long-horizon memory that supports an agentic workflow for planning, retrieval, and reasoning over open-ended instructions at variable granularity, while producing precise, actionable answers for navigation. We present STaR, an agentic reasoning framework that (i) constructs a task-agnostic, multimodal long-term memory that generalizes to unseen queries while preserving fine-grained environmental semantics (object attributes, spatial relations, and dynamic events), and (ii) introduces a Scalable Task Conditioned Retrieval algorithm based on the Information Bottleneck principle to extract from long-term memory a compact, non-redundant, information-rich set of candidate memories for contextual reasoning. We evaluate STaR on NaVQA (mixed indoor/outdoor campus scenes) and WH-VQA, a customized warehouse benchmark with many visually similar objects built with Isaac Sim, emphasizing contextual reasoning. Across the two datasets, STaR consistently outperforms strong baselines, achieving higher success rates and markedly lower spatial error. We further deploy STaR on a real Husky wheeled robot in both indoor and outdoor environments, demonstrating robust long horizon reasoning, scalability, and practical utility. Project Website: https://trailab.github.io/STaR-website/
format Preprint
id arxiv_https___arxiv_org_abs_2602_09255
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory
Yuan, Mingfeng
Zhang, Hao
Mohammadi, Mahan
Li, Runhao
Shan, Jinjun
Waslander, Steven L.
Robotics
Artificial Intelligence
Mobile robots are often deployed over long durations in diverse open, dynamic scenes, including indoor setting such as warehouses and manufacturing facilities, and outdoor settings such as agricultural and roadway operations. A core challenge is to build a scalable long-horizon memory that supports an agentic workflow for planning, retrieval, and reasoning over open-ended instructions at variable granularity, while producing precise, actionable answers for navigation. We present STaR, an agentic reasoning framework that (i) constructs a task-agnostic, multimodal long-term memory that generalizes to unseen queries while preserving fine-grained environmental semantics (object attributes, spatial relations, and dynamic events), and (ii) introduces a Scalable Task Conditioned Retrieval algorithm based on the Information Bottleneck principle to extract from long-term memory a compact, non-redundant, information-rich set of candidate memories for contextual reasoning. We evaluate STaR on NaVQA (mixed indoor/outdoor campus scenes) and WH-VQA, a customized warehouse benchmark with many visually similar objects built with Isaac Sim, emphasizing contextual reasoning. Across the two datasets, STaR consistently outperforms strong baselines, achieving higher success rates and markedly lower spatial error. We further deploy STaR on a real Husky wheeled robot in both indoor and outdoor environments, demonstrating robust long horizon reasoning, scalability, and practical utility. Project Website: https://trailab.github.io/STaR-website/
title STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory
topic Robotics
Artificial Intelligence
url https://arxiv.org/abs/2602.09255