Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Narang, Arya
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2511.11705
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912710272221184
author	Narang, Arya
author_facet	Narang, Arya
contents	This paper determines the extent to which short textual inputs (in this case, names of dishes) can improve calorie estimation compared to an image-only baseline model and whether any improvements are statistically significant. Utilizes the TensorFlow library and the Nutrition5k dataset (curated by Google) to train both an image-only CNN and multimodal CNN that accepts both text and an image as input. The MAE of calorie estimations was reduced by 1.06 kcal from 84.76 kcal to 83.70 kcal (1.25% improvement) when using the multimodal model.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_11705
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Multimodal ML: Quantifying the Improvement of Calorie Estimation Through Image-Text Pairs Narang, Arya Machine Learning Computer Vision and Pattern Recognition This paper determines the extent to which short textual inputs (in this case, names of dishes) can improve calorie estimation compared to an image-only baseline model and whether any improvements are statistically significant. Utilizes the TensorFlow library and the Nutrition5k dataset (curated by Google) to train both an image-only CNN and multimodal CNN that accepts both text and an image as input. The MAE of calorie estimations was reduced by 1.06 kcal from 84.76 kcal to 83.70 kcal (1.25% improvement) when using the multimodal model.
title	Multimodal ML: Quantifying the Improvement of Calorie Estimation Through Image-Text Pairs
topic	Machine Learning Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2511.11705

Similar Items