Saved in:
Bibliographic Details
Main Authors: Yun, Guhnoo, Yoo, Juhan, Kim, Kijung, Kim, Dong Hwan
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.10087
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913110673063936
author Yun, Guhnoo
Yoo, Juhan
Kim, Kijung
Kim, Dong Hwan
author_facet Yun, Guhnoo
Yoo, Juhan
Kim, Kijung
Kim, Dong Hwan
contents This paper describes an initiation of interaction(IoI) detection framework without keywords for human-robot interaction(HRI) based on audio and vision sensor fusion in a domestic environment. In the proposed framework, the robot has its own audio and vision sensors, and can employ external vision sensor for stable human detection and tracking. When the user starts to speak while looking at the robot, the robot can localize his or her position by its sound source localization together with human tracking information. Then the robot can detect the IoI if it perceives the face of the speaker faces the robot. In case that the user does not speak directly, the robot can also detect the IoI if he or she looks at the robot for more than predefined periods of time. A state transition model for the proposed IoI detection framework is designed and verified by experiments with a mobile robot. In order to implement and associate our model in a robot architecture, all the components are implemented and integrated in the Robot Operating System(ROS) environment.
format Preprint
id arxiv_https___arxiv_org_abs_2605_10087
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Initiation of Interaction Detection Framework using a Nonverbal Cue for Human-Robot Interaction
Yun, Guhnoo
Yoo, Juhan
Kim, Kijung
Kim, Dong Hwan
Computer Vision and Pattern Recognition
This paper describes an initiation of interaction(IoI) detection framework without keywords for human-robot interaction(HRI) based on audio and vision sensor fusion in a domestic environment. In the proposed framework, the robot has its own audio and vision sensors, and can employ external vision sensor for stable human detection and tracking. When the user starts to speak while looking at the robot, the robot can localize his or her position by its sound source localization together with human tracking information. Then the robot can detect the IoI if it perceives the face of the speaker faces the robot. In case that the user does not speak directly, the robot can also detect the IoI if he or she looks at the robot for more than predefined periods of time. A state transition model for the proposed IoI detection framework is designed and verified by experiments with a mobile robot. In order to implement and associate our model in a robot architecture, all the components are implemented and integrated in the Robot Operating System(ROS) environment.
title Initiation of Interaction Detection Framework using a Nonverbal Cue for Human-Robot Interaction
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2605.10087