Saved in:
Bibliographic Details
Main Authors: Inoue, Koji, Okafuji, Yuki, Baba, Jun, Ohira, Yoshiki, Hyodo, Katsuya, Kawahara, Tatsuya
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2503.06241
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909687627120640
author Inoue, Koji
Okafuji, Yuki
Baba, Jun
Ohira, Yoshiki
Hyodo, Katsuya
Kawahara, Tatsuya
author_facet Inoue, Koji
Okafuji, Yuki
Baba, Jun
Ohira, Yoshiki
Hyodo, Katsuya
Kawahara, Tatsuya
contents Turn-taking is a crucial aspect of human-robot interaction, directly influencing conversational fluidity and user engagement. While previous research has explored turn-taking models in controlled environments, their robustness in real-world settings remains underexplored. In this study, we propose a noise-robust voice activity projection (VAP) model, based on a Transformer architecture, to enhance real-time turn-taking in dialogue robots. To evaluate the effectiveness of the proposed system, we conducted a field experiment in a shopping mall, comparing the VAP system with a conventional cloud-based speech recognition system. Our analysis covered both subjective user evaluations and objective behavioral analysis. The results showed that the proposed system significantly reduced response latency, leading to a more natural conversation where both the robot and users responded faster. The subjective evaluations suggested that faster responses contribute to a better interaction experience.
format Preprint
id arxiv_https___arxiv_org_abs_2503_06241
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment
Inoue, Koji
Okafuji, Yuki
Baba, Jun
Ohira, Yoshiki
Hyodo, Katsuya
Kawahara, Tatsuya
Robotics
Computation and Language
Sound
Turn-taking is a crucial aspect of human-robot interaction, directly influencing conversational fluidity and user engagement. While previous research has explored turn-taking models in controlled environments, their robustness in real-world settings remains underexplored. In this study, we propose a noise-robust voice activity projection (VAP) model, based on a Transformer architecture, to enhance real-time turn-taking in dialogue robots. To evaluate the effectiveness of the proposed system, we conducted a field experiment in a shopping mall, comparing the VAP system with a conventional cloud-based speech recognition system. Our analysis covered both subjective user evaluations and objective behavioral analysis. The results showed that the proposed system significantly reduced response latency, leading to a more natural conversation where both the robot and users responded faster. The subjective evaluations suggested that faster responses contribute to a better interaction experience.
title A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment
topic Robotics
Computation and Language
Sound
url https://arxiv.org/abs/2503.06241