Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autor principal:	Niimi, Junichiro
Formato:	Preprint
Publicado:	2024
Materias:	Computational Engineering, Finance, and Science
Acceso en línea:	https://arxiv.org/abs/2405.07435
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866916243710148608
author	Niimi, Junichiro
author_facet	Niimi, Junichiro
contents	Today, the acquisition of various behavioral log data has enabled deeper understanding of customer preferences and future behaviors in the marketing field. In particular, multimodal deep learning has achieved highly accurate predictions by combining multiple types of data. Many of these studies utilize with feature fusion to construct multimodal models, which combines extracted representations from each modality. However, since feature fusion treats information from each modality equally, it is difficult to perform flexible analysis such as the attention mechanism that has been used extensively in recent years. Therefore, this study proposes a context-aware multimodal deep learning model that combines Bidirectional Encoder Representations from Transformers (BERT) and cross-attention Transformer, which dynamically changes the attention of deep-contextualized word representations based on background information such as consumer demographic and lifestyle variables. We conduct a comprehensive analysis and demonstrate the effectiveness of our model by comparing it with six reference models in three categories using behavioral logs stored on an online platform. In addition, we present an efficient multimodal learning method by comparing the learning efficiency depending on the optimizers and the prediction accuracy depending on the number of tokens in the text data.
format	Preprint
id	arxiv_https___arxiv_org_abs_2405_07435
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-Attention Niimi, Junichiro Computational Engineering, Finance, and Science Today, the acquisition of various behavioral log data has enabled deeper understanding of customer preferences and future behaviors in the marketing field. In particular, multimodal deep learning has achieved highly accurate predictions by combining multiple types of data. Many of these studies utilize with feature fusion to construct multimodal models, which combines extracted representations from each modality. However, since feature fusion treats information from each modality equally, it is difficult to perform flexible analysis such as the attention mechanism that has been used extensively in recent years. Therefore, this study proposes a context-aware multimodal deep learning model that combines Bidirectional Encoder Representations from Transformers (BERT) and cross-attention Transformer, which dynamically changes the attention of deep-contextualized word representations based on background information such as consumer demographic and lifestyle variables. We conduct a comprehensive analysis and demonstrate the effectiveness of our model by comparing it with six reference models in three categories using behavioral logs stored on an online platform. In addition, we present an efficient multimodal learning method by comparing the learning efficiency depending on the optimizers and the prediction accuracy depending on the number of tokens in the text data.
title	An Efficient Multimodal Learning Framework to Comprehend Consumer Preferences Using BERT and Cross-Attention
topic	Computational Engineering, Finance, and Science
url	https://arxiv.org/abs/2405.07435

Ejemplares similares