Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.06241 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909687627120640 |
|---|---|
| author | Inoue, Koji Okafuji, Yuki Baba, Jun Ohira, Yoshiki Hyodo, Katsuya Kawahara, Tatsuya |
| author_facet | Inoue, Koji Okafuji, Yuki Baba, Jun Ohira, Yoshiki Hyodo, Katsuya Kawahara, Tatsuya |
| contents | Turn-taking is a crucial aspect of human-robot interaction, directly influencing conversational fluidity and user engagement. While previous research has explored turn-taking models in controlled environments, their robustness in real-world settings remains underexplored. In this study, we propose a noise-robust voice activity projection (VAP) model, based on a Transformer architecture, to enhance real-time turn-taking in dialogue robots. To evaluate the effectiveness of the proposed system, we conducted a field experiment in a shopping mall, comparing the VAP system with a conventional cloud-based speech recognition system. Our analysis covered both subjective user evaluations and objective behavioral analysis. The results showed that the proposed system significantly reduced response latency, leading to a more natural conversation where both the robot and users responded faster. The subjective evaluations suggested that faster responses contribute to a better interaction experience. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2503_06241 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment Inoue, Koji Okafuji, Yuki Baba, Jun Ohira, Yoshiki Hyodo, Katsuya Kawahara, Tatsuya Robotics Computation and Language Sound Turn-taking is a crucial aspect of human-robot interaction, directly influencing conversational fluidity and user engagement. While previous research has explored turn-taking models in controlled environments, their robustness in real-world settings remains underexplored. In this study, we propose a noise-robust voice activity projection (VAP) model, based on a Transformer architecture, to enhance real-time turn-taking in dialogue robots. To evaluate the effectiveness of the proposed system, we conducted a field experiment in a shopping mall, comparing the VAP system with a conventional cloud-based speech recognition system. Our analysis covered both subjective user evaluations and objective behavioral analysis. The results showed that the proposed system significantly reduced response latency, leading to a more natural conversation where both the robot and users responded faster. The subjective evaluations suggested that faster responses contribute to a better interaction experience. |
| title | A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment |
| topic | Robotics Computation and Language Sound |
| url | https://arxiv.org/abs/2503.06241 |