Saved in:
Bibliographic Details
Main Authors: Ren, Qiaoqiao, Proesmans, Remko, Hou, Yuanbo, wyffels, Francis, Belpaeme, Tony
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2412.03300
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913986029551616
author Ren, Qiaoqiao
Proesmans, Remko
Hou, Yuanbo
wyffels, Francis
Belpaeme, Tony
author_facet Ren, Qiaoqiao
Proesmans, Remko
Hou, Yuanbo
wyffels, Francis
Belpaeme, Tony
contents Human emotions are complex and can be conveyed through nuanced touch gestures. Previous research has primarily focused on how humans recognize emotions through touch or on identifying key features of emotional expression for robots. However, there is a gap in understanding how reliably these emotions and gestures can be communicated to robots via touch and interpreted using data driven methods. This study investigates the consistency and distinguishability of emotional and gestural expressions through touch and sound. To this end, we integrated a custom piezoresistive pressure sensor as well as a microphone on a social robot. Twenty-eight participants first conveyed ten different emotions to the robot using spontaneous touch gestures, then they performed six predefined social touch gestures. Our findings reveal statistically significant consistency in both emotion and gesture expression among participants. However, some emotions exhibited low intraclass correlation values, and certain emotions with similar levels of arousal or valence did not show significant differences in their conveyance. To investigate emotion and social gesture decoding within affective human-robot tactile interaction, we developed single-modality models and multimodal models integrating tactile and auditory features. A support vector machine (SVM) model trained on multimodal features achieved the highest accuracy for classifying ten emotions, reaching 40 %.For gesture classification, a Convolutional Neural Network- Long Short-Term Memory Network (CNN-LSTM) achieved 90.74 % accuracy. Our results demonstrate that even though the unimodal models have the potential to decode emotions and touch gestures, the multimodal integration of touch and sound significantly outperforms unimodal approaches, enhancing the decoding of both emotions and gestures.
format Preprint
id arxiv_https___arxiv_org_abs_2412_03300
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots
Ren, Qiaoqiao
Proesmans, Remko
Hou, Yuanbo
wyffels, Francis
Belpaeme, Tony
Robotics
Machine Learning
Human emotions are complex and can be conveyed through nuanced touch gestures. Previous research has primarily focused on how humans recognize emotions through touch or on identifying key features of emotional expression for robots. However, there is a gap in understanding how reliably these emotions and gestures can be communicated to robots via touch and interpreted using data driven methods. This study investigates the consistency and distinguishability of emotional and gestural expressions through touch and sound. To this end, we integrated a custom piezoresistive pressure sensor as well as a microphone on a social robot. Twenty-eight participants first conveyed ten different emotions to the robot using spontaneous touch gestures, then they performed six predefined social touch gestures. Our findings reveal statistically significant consistency in both emotion and gesture expression among participants. However, some emotions exhibited low intraclass correlation values, and certain emotions with similar levels of arousal or valence did not show significant differences in their conveyance. To investigate emotion and social gesture decoding within affective human-robot tactile interaction, we developed single-modality models and multimodal models integrating tactile and auditory features. A support vector machine (SVM) model trained on multimodal features achieved the highest accuracy for classifying ten emotions, reaching 40 %.For gesture classification, a Convolutional Neural Network- Long Short-Term Memory Network (CNN-LSTM) achieved 90.74 % accuracy. Our results demonstrate that even though the unimodal models have the potential to decode emotions and touch gestures, the multimodal integration of touch and sound significantly outperforms unimodal approaches, enhancing the decoding of both emotions and gestures.
title Touch and Tell: Multimodal Decoding of Human Emotions and Social Gestures for Robots
topic Robotics
Machine Learning
url https://arxiv.org/abs/2412.03300