Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Wu, Kui, Guo, Beiyu, Chen, Hao, Xu, ShuHang, Li, Yuling, Zeng, Yongdan, Li, Zhoujun, Wang, Yizhou, Zhong, Fangwei
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2606.01848
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914622158667776
author	Wu, Kui Guo, Beiyu Chen, Hao Xu, ShuHang Li, Yuling Zeng, Yongdan Li, Zhoujun Wang, Yizhou Zhong, Fangwei
author_facet	Wu, Kui Guo, Beiyu Chen, Hao Xu, ShuHang Li, Yuling Zeng, Yongdan Li, Zhoujun Wang, Yizhou Zhong, Fangwei
contents	Search-and-rescue (SAR) requires embodied agents to explore unfamiliar environments under multimodal uncertainty, perform multi-stage interactions, and retrieve spatial memory over long horizons. Existing benchmarks typically evaluate these capabilities in isolation, leaving unclear how failures compound when they must be composed in realistic workflows. We introduce RescueBench, a photo-realistic diagnostic benchmark that instantiates SAR as a four-stage pipeline: multimodal exploration, target rescue, memory-guided return, and final handoff. By combining sequential task composition with stage-level evaluation, RescueBench enables analysis of how exploration and memory failures propagate through embodied rescue workflows. It contains five progressive difficulty levels that vary in environmental complexity, clue ambiguity, and spatial hierarchy, along with an automatic episode generation and annotation pipeline for scalable evaluation and training. We evaluate seven baselines, an oracle reference, and human players, showing that no baselines complete the full task at the greatest difficulty. Stage-level diagnosis identifies autonomous exploration as the dominant failure mode and spatial memory as a second, independent bottleneck, suggesting that these limitations are not resolved by current topological visual-language navigation or map-based methods. Code is available in https://github.com/wukui-muc/RescueBench
format	Preprint
id	arxiv_https___arxiv_org_abs_2606_01848
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	RescueBench: Can Embodied Agents Save Lives in the Wild ? Wu, Kui Guo, Beiyu Chen, Hao Xu, ShuHang Li, Yuling Zeng, Yongdan Li, Zhoujun Wang, Yizhou Zhong, Fangwei Computer Vision and Pattern Recognition Search-and-rescue (SAR) requires embodied agents to explore unfamiliar environments under multimodal uncertainty, perform multi-stage interactions, and retrieve spatial memory over long horizons. Existing benchmarks typically evaluate these capabilities in isolation, leaving unclear how failures compound when they must be composed in realistic workflows. We introduce RescueBench, a photo-realistic diagnostic benchmark that instantiates SAR as a four-stage pipeline: multimodal exploration, target rescue, memory-guided return, and final handoff. By combining sequential task composition with stage-level evaluation, RescueBench enables analysis of how exploration and memory failures propagate through embodied rescue workflows. It contains five progressive difficulty levels that vary in environmental complexity, clue ambiguity, and spatial hierarchy, along with an automatic episode generation and annotation pipeline for scalable evaluation and training. We evaluate seven baselines, an oracle reference, and human players, showing that no baselines complete the full task at the greatest difficulty. Stage-level diagnosis identifies autonomous exploration as the dominant failure mode and spatial memory as a second, independent bottleneck, suggesting that these limitations are not resolved by current topological visual-language navigation or map-based methods. Code is available in https://github.com/wukui-muc/RescueBench
title	RescueBench: Can Embodied Agents Save Lives in the Wild ?
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2606.01848

Similar Items