Saved in:
Bibliographic Details
Main Author: Narang, Arya
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.11705
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912710272221184
author Narang, Arya
author_facet Narang, Arya
contents This paper determines the extent to which short textual inputs (in this case, names of dishes) can improve calorie estimation compared to an image-only baseline model and whether any improvements are statistically significant. Utilizes the TensorFlow library and the Nutrition5k dataset (curated by Google) to train both an image-only CNN and multimodal CNN that accepts both text and an image as input. The MAE of calorie estimations was reduced by 1.06 kcal from 84.76 kcal to 83.70 kcal (1.25% improvement) when using the multimodal model.
format Preprint
id arxiv_https___arxiv_org_abs_2511_11705
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Multimodal ML: Quantifying the Improvement of Calorie Estimation Through Image-Text Pairs
Narang, Arya
Machine Learning
Computer Vision and Pattern Recognition
This paper determines the extent to which short textual inputs (in this case, names of dishes) can improve calorie estimation compared to an image-only baseline model and whether any improvements are statistically significant. Utilizes the TensorFlow library and the Nutrition5k dataset (curated by Google) to train both an image-only CNN and multimodal CNN that accepts both text and an image as input. The MAE of calorie estimations was reduced by 1.06 kcal from 84.76 kcal to 83.70 kcal (1.25% improvement) when using the multimodal model.
title Multimodal ML: Quantifying the Improvement of Calorie Estimation Through Image-Text Pairs
topic Machine Learning
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2511.11705