Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sitdhipol, Supawich, Sukprasongdee, Waritwong, Chuangsuwanich, Ekapol, Tse, Rina
Format:	Preprint
Published:	2025
Subjects:	Robotics Computation and Language Information Theory Machine Learning Systems and Control
Online Access:	https://arxiv.org/abs/2507.19947
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866916870582435840
author	Sitdhipol, Supawich Sukprasongdee, Waritwong Chuangsuwanich, Ekapol Tse, Rina
author_facet	Sitdhipol, Supawich Sukprasongdee, Waritwong Chuangsuwanich, Ekapol Tse, Rina
contents	Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_19947
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations Sitdhipol, Supawich Sukprasongdee, Waritwong Chuangsuwanich, Ekapol Tse, Rina Robotics Computation and Language Information Theory Machine Learning Systems and Control Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.
title	Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations
topic	Robotics Computation and Language Information Theory Machine Learning Systems and Control
url	https://arxiv.org/abs/2507.19947

Similar Items