Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ren, Qiaoqiao, Proesmans, Remko, Hou, Yuanbo, wyffels, Francis, Belpaeme, Tony
Format:	Preprint
Published:	2024
Subjects:	Robotics Machine Learning
Online Access:	https://arxiv.org/abs/2412.03300
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913986029551616
author	Ren, Qiaoqiao Proesmans, Remko Hou, Yuanbo wyffels, Francis Belpaeme, Tony
author_facet	Ren, Qiaoqiao Proesmans, Remko Hou, Yuanbo wyffels, Francis Belpaeme, Tony
contents	Human emotions are complex and can be conveyed through nuanced touch gestures. Previous research has primarily focused on how humans recognize emotions through touch or on identifying key features of emotional expression for robots. However, there is a gap in understanding how reliably these emotions and gestures can be communicated to robots via touch and interpreted using data driven methods. This study investigates the consistency and distinguishability of emotional and gestural expressions through touch and sound. To this end, we integrated a custom piezoresistive pressure sensor as well as a microphone on a social robot. Twenty-eight participants first conveyed ten different emotions to the robot using spontaneous touch gestures, then they performed six predefined social touch gestures. Our findings reveal statistically significant consistency in both emotion and gesture expression among participants. However, some emotions exhibited low intraclass correlation values, and certain emotions with similar levels of arousal or valence did not show significant differences in their conveyance. To investigate emotion and social gesture decoding within affective human-robot tactile interaction, we developed single-modality models and multimodal models integrating tactile and auditory features. A support vector machine (SVM) model trained on multimodal features achieved the highest accuracy for classifying ten emotions, reaching 40 %.For gesture classification, a Convolutional Neural Network- Long Short-Term Memory Network (CNN-LSTM) achieved 90.74 % accuracy. Our results demonstrate that even though the unimodal models have the potential to decode emotions and touch gestures, the multimodal integration of touch and sound significantly outperforms unimodal approaches, enhancing the decoding of both emotions and gestures.
format	Preprint
id	arxiv_https___arxiv_org_abs_2412_03300
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots Ren, Qiaoqiao Proesmans, Remko Hou, Yuanbo wyffels, Francis Belpaeme, Tony Robotics Machine Learning Human emotions are complex and can be conveyed through nuanced touch gestures. Previous research has primarily focused on how humans recognize emotions through touch or on identifying key features of emotional expression for robots. However, there is a gap in understanding how reliably these emotions and gestures can be communicated to robots via touch and interpreted using data driven methods. This study investigates the consistency and distinguishability of emotional and gestural expressions through touch and sound. To this end, we integrated a custom piezoresistive pressure sensor as well as a microphone on a social robot. Twenty-eight participants first conveyed ten different emotions to the robot using spontaneous touch gestures, then they performed six predefined social touch gestures. Our findings reveal statistically significant consistency in both emotion and gesture expression among participants. However, some emotions exhibited low intraclass correlation values, and certain emotions with similar levels of arousal or valence did not show significant differences in their conveyance. To investigate emotion and social gesture decoding within affective human-robot tactile interaction, we developed single-modality models and multimodal models integrating tactile and auditory features. A support vector machine (SVM) model trained on multimodal features achieved the highest accuracy for classifying ten emotions, reaching 40 %.For gesture classification, a Convolutional Neural Network- Long Short-Term Memory Network (CNN-LSTM) achieved 90.74 % accuracy. Our results demonstrate that even though the unimodal models have the potential to decode emotions and touch gestures, the multimodal integration of touch and sound significantly outperforms unimodal approaches, enhancing the decoding of both emotions and gestures.
title	Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots
topic	Robotics Machine Learning
url	https://arxiv.org/abs/2412.03300

Similar Items