Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Biswas, Sumana, Young, Karen, Griffith, Josephine
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.00360
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918316522602496
author	Biswas, Sumana Young, Karen Griffith, Josephine
author_facet	Biswas, Sumana Young, Karen Griffith, Josephine
contents	Multimodal sentiment analysis, which includes both image and text data, presents several challenges due to the dissimilarities in the modalities of text and image, the ambiguity of sentiment, and the complexities of contextual meaning. In this work, we experiment with finding the sentiments of image and text data, individually and in combination, on two datasets. Part of the approach introduces the novel `Textual-Cues for Enhancing Multimodal Sentiment Analysis' (TEMSA) based on object recognition methods to address the difficulties in multimodal sentiment analysis. Specifically, we extract the names of all objects detected in an image and combine them with associated text; we call this combination of text and image data TEMS. Our results demonstrate that only TEMS improves the results when considering all the object names for the overall sentiment of multimodal data compared to individual analysis. This research contributes to advancing multimodal sentiment analysis and offers insights into the efficacy of TEMSA in combining image and text data for multimodal sentiment analysis.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_00360
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Leveraging Textual-Cues for Enhancing Multimodal Sentiment Analysis by Object Recognition Biswas, Sumana Young, Karen Griffith, Josephine Machine Learning Multimodal sentiment analysis, which includes both image and text data, presents several challenges due to the dissimilarities in the modalities of text and image, the ambiguity of sentiment, and the complexities of contextual meaning. In this work, we experiment with finding the sentiments of image and text data, individually and in combination, on two datasets. Part of the approach introduces the novel `Textual-Cues for Enhancing Multimodal Sentiment Analysis' (TEMSA) based on object recognition methods to address the difficulties in multimodal sentiment analysis. Specifically, we extract the names of all objects detected in an image and combine them with associated text; we call this combination of text and image data TEMS. Our results demonstrate that only TEMS improves the results when considering all the object names for the overall sentiment of multimodal data compared to individual analysis. This research contributes to advancing multimodal sentiment analysis and offers insights into the efficacy of TEMSA in combining image and text data for multimodal sentiment analysis.
title	Leveraging Textual-Cues for Enhancing Multimodal Sentiment Analysis by Object Recognition
topic	Machine Learning
url	https://arxiv.org/abs/2602.00360

Similar Items