Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Shichen, Shao, Chenhui
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2504.15599
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915253265104896
author	Li, Shichen Shao, Chenhui
author_facet	Li, Shichen Shao, Chenhui
contents	Food drying is essential for food production, extending shelf life, and reducing transportation costs. Accurate real-time forecasting of drying readiness is crucial for minimizing energy consumption, improving productivity, and ensuring product quality. However, this remains challenging due to the dynamic nature of drying, limited data availability, and the lack of effective predictive analytical methods. To address this gap, we propose an end-to-end multi-modal data fusion framework that integrates in-situ video data with process parameters for real-time food drying readiness forecasting. Our approach leverages a new encoder-decoder architecture with modality-specific encoders and a transformer-based decoder to effectively extract features while preserving the unique structure of each modality. We apply our approach to sugar cookie drying, where time-to-ready is predicted at each timestamp. Experimental results demonstrate that our model achieves an average prediction error of only 15 seconds, outperforming state-of-the-art data fusion methods by 65.69% and a video-only model by 11.30%. Additionally, our model balances prediction accuracy, model size, and computational efficiency, making it well-suited for heterogenous industrial datasets. The proposed model is extensible to various other industrial modality fusion tasks for online decision-making.
format	Preprint
id	arxiv_https___arxiv_org_abs_2504_15599
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Multi-Modal Fusion of In-Situ Video Data and Process Parameters for Online Forecasting of Cookie Drying Readiness Li, Shichen Shao, Chenhui Computer Vision and Pattern Recognition Machine Learning Food drying is essential for food production, extending shelf life, and reducing transportation costs. Accurate real-time forecasting of drying readiness is crucial for minimizing energy consumption, improving productivity, and ensuring product quality. However, this remains challenging due to the dynamic nature of drying, limited data availability, and the lack of effective predictive analytical methods. To address this gap, we propose an end-to-end multi-modal data fusion framework that integrates in-situ video data with process parameters for real-time food drying readiness forecasting. Our approach leverages a new encoder-decoder architecture with modality-specific encoders and a transformer-based decoder to effectively extract features while preserving the unique structure of each modality. We apply our approach to sugar cookie drying, where time-to-ready is predicted at each timestamp. Experimental results demonstrate that our model achieves an average prediction error of only 15 seconds, outperforming state-of-the-art data fusion methods by 65.69% and a video-only model by 11.30%. Additionally, our model balances prediction accuracy, model size, and computational efficiency, making it well-suited for heterogenous industrial datasets. The proposed model is extensible to various other industrial modality fusion tasks for online decision-making.
title	Multi-Modal Fusion of In-Situ Video Data and Process Parameters for Online Forecasting of Cookie Drying Readiness
topic	Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2504.15599

Similar Items