Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Pellegrain, Victor, Tami, Myriam, Batteux, Michel, Hudelot, Céline
Format:	Preprint
Published:	2021
Subjects:	Machine Learning Computation and Language Multimedia
Online Access:	https://arxiv.org/abs/2110.08021
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911781404803072
author	Pellegrain, Victor Tami, Myriam Batteux, Michel Hudelot, Céline
author_facet	Pellegrain, Victor Tami, Myriam Batteux, Michel Hudelot, Céline
contents	The increasing complexity of Industry 4.0 systems brings new challenges regarding predictive maintenance tasks such as fault detection and diagnosis. A corresponding and realistic setting includes multi-source data streams from different modalities, such as sensors measurements time series, machine images, textual maintenance reports, etc. These heterogeneous multimodal streams also differ in their acquisition frequency, may embed temporally unaligned information and can be arbitrarily long, depending on the considered system and task. Whereas multimodal fusion has been largely studied in a static setting, to the best of our knowledge, there exists no previous work considering arbitrarily long multimodal streams alongside with related tasks such as prediction across time. Thus, in this paper, we first formalize this paradigm of heterogeneous multimodal learning in a streaming setting as a new one. To tackle this challenge, we propose StreaMulT, a Streaming Multimodal Transformer relying on cross-modal attention and on a memory bank to process arbitrarily long input sequences at training time and run in a streaming way at inference. StreaMulT improves the state-of-the-art metrics on CMU-MOSEI dataset for Multimodal Sentiment Analysis task, while being able to deal with much longer inputs than other multimodal models. The conducted experiments eventually highlight the importance of the textual embedding layer, questioning recent improvements in Multimodal Sentiment Analysis benchmarks.
format	Preprint
id	arxiv_https___arxiv_org_abs_2110_08021
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data Pellegrain, Victor Tami, Myriam Batteux, Michel Hudelot, Céline Machine Learning Computation and Language Multimedia The increasing complexity of Industry 4.0 systems brings new challenges regarding predictive maintenance tasks such as fault detection and diagnosis. A corresponding and realistic setting includes multi-source data streams from different modalities, such as sensors measurements time series, machine images, textual maintenance reports, etc. These heterogeneous multimodal streams also differ in their acquisition frequency, may embed temporally unaligned information and can be arbitrarily long, depending on the considered system and task. Whereas multimodal fusion has been largely studied in a static setting, to the best of our knowledge, there exists no previous work considering arbitrarily long multimodal streams alongside with related tasks such as prediction across time. Thus, in this paper, we first formalize this paradigm of heterogeneous multimodal learning in a streaming setting as a new one. To tackle this challenge, we propose StreaMulT, a Streaming Multimodal Transformer relying on cross-modal attention and on a memory bank to process arbitrarily long input sequences at training time and run in a streaming way at inference. StreaMulT improves the state-of-the-art metrics on CMU-MOSEI dataset for Multimodal Sentiment Analysis task, while being able to deal with much longer inputs than other multimodal models. The conducted experiments eventually highlight the importance of the textual embedding layer, questioning recent improvements in Multimodal Sentiment Analysis benchmarks.
title	StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data
topic	Machine Learning Computation and Language Multimedia
url	https://arxiv.org/abs/2110.08021

Similar Items