Saved in:
Bibliographic Details
Main Authors: Deng, Andong, Yang, Taojiannan, Chen, Chen, Chen, Qian, Neely, Leslie, Oyama, Sakiko
Format: Preprint
Published: 2022
Subjects:
Online Access:https://arxiv.org/abs/2211.09310
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917558685270016
author Deng, Andong
Yang, Taojiannan
Chen, Chen
Chen, Qian
Neely, Leslie
Oyama, Sakiko
author_facet Deng, Andong
Yang, Taojiannan
Chen, Chen
Chen, Qian
Neely, Leslie
Oyama, Sakiko
contents Correctly recognizing the behaviors of children with Autism Spectrum Disorder (ASD) is of vital importance for the diagnosis of Autism and timely early intervention. However, the observation and recording during the treatment from the parents of autistic children may not be accurate and objective. In such cases, automatic recognition systems based on computer vision and machine learning (in particular deep learning) technology can alleviate this issue to a large extent. Existing human action recognition models can now achieve persuasive performance on challenging activity datasets, e.g. daily activity, and sports activity. However, problem behaviors in children with ASD are very different from these general activities, and recognizing these problem behaviors via computer vision is less studied. In this paper, we first evaluate a strong baseline for action recognition, i.e. Video Swin Transformer, on two autism behaviors datasets (SSBD and ESBD) and show that it can achieve high accuracy and outperform the previous methods by a large margin, demonstrating the feasibility of vision-based problem behaviors recognition. Moreover, we propose language-assisted training to further enhance the action recognition performance. Specifically, we develop a two-branch multimodal deep learning framework by incorporating the "freely available" language description for each type of problem behavior. Experimental results demonstrate that incorporating additional language supervision can bring an obvious performance boost for the autism problem behaviors recognition task as compared to using the video information only (i.e. 3.49% improvement on ESBD and 1.46% on SSBD).
format Preprint
id arxiv_https___arxiv_org_abs_2211_09310
institution arXiv
publishDate 2022
record_format arxiv
spellingShingle Language-Assisted Deep Learning for Autistic Behaviors Recognition
Deng, Andong
Yang, Taojiannan
Chen, Chen
Chen, Qian
Neely, Leslie
Oyama, Sakiko
Computer Vision and Pattern Recognition
Correctly recognizing the behaviors of children with Autism Spectrum Disorder (ASD) is of vital importance for the diagnosis of Autism and timely early intervention. However, the observation and recording during the treatment from the parents of autistic children may not be accurate and objective. In such cases, automatic recognition systems based on computer vision and machine learning (in particular deep learning) technology can alleviate this issue to a large extent. Existing human action recognition models can now achieve persuasive performance on challenging activity datasets, e.g. daily activity, and sports activity. However, problem behaviors in children with ASD are very different from these general activities, and recognizing these problem behaviors via computer vision is less studied. In this paper, we first evaluate a strong baseline for action recognition, i.e. Video Swin Transformer, on two autism behaviors datasets (SSBD and ESBD) and show that it can achieve high accuracy and outperform the previous methods by a large margin, demonstrating the feasibility of vision-based problem behaviors recognition. Moreover, we propose language-assisted training to further enhance the action recognition performance. Specifically, we develop a two-branch multimodal deep learning framework by incorporating the "freely available" language description for each type of problem behavior. Experimental results demonstrate that incorporating additional language supervision can bring an obvious performance boost for the autism problem behaviors recognition task as compared to using the video information only (i.e. 3.49% improvement on ESBD and 1.46% on SSBD).
title Language-Assisted Deep Learning for Autistic Behaviors Recognition
topic Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2211.09310