Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jiang, Yi-Lu, Chang, Wen-Chang, Wang, Ching-Lin, Hsu, Kung-Liang, Chiu, Chih-Yi
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition Multimedia Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2505.11020
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918022161104896
author	Jiang, Yi-Lu Chang, Wen-Chang Wang, Ching-Lin Hsu, Kung-Liang Chiu, Chih-Yi
author_facet	Jiang, Yi-Lu Chang, Wen-Chang Wang, Ching-Lin Hsu, Kung-Liang Chiu, Chih-Yi
contents	Determining the shelf life quality of pineapples using non-destructive methods is a crucial step to reduce waste and increase income. In this paper, a multimodal and multiview classification model was constructed to classify pineapples into four quality levels based on audio and visual characteristics. For research purposes, we compiled and released the PQC500 dataset consisting of 500 pineapples with two modalities: one was tapping pineapples to record sounds by multiple microphones and the other was taking pictures by multiple cameras at different locations, providing multimodal and multi-view audiovisual features. We modified the contrastive audiovisual masked autoencoder to train the cross-modal-based classification model by abundant combinations of audio and visual pairs. In addition, we proposed to sample a compact size of training data for efficient computation. The experiments were evaluated under various data and model configurations, and the results demonstrated that the proposed cross-modal model trained using audio-major sampling can yield 84% accuracy, outperforming the unimodal models of only audio and only visual by 6% and 18%, respectively.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_11020
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Classifying Shelf Life Quality of Pineapples by Combining Audio and Visual Features Jiang, Yi-Lu Chang, Wen-Chang Wang, Ching-Lin Hsu, Kung-Liang Chiu, Chih-Yi Computer Vision and Pattern Recognition Multimedia Sound Audio and Speech Processing Determining the shelf life quality of pineapples using non-destructive methods is a crucial step to reduce waste and increase income. In this paper, a multimodal and multiview classification model was constructed to classify pineapples into four quality levels based on audio and visual characteristics. For research purposes, we compiled and released the PQC500 dataset consisting of 500 pineapples with two modalities: one was tapping pineapples to record sounds by multiple microphones and the other was taking pictures by multiple cameras at different locations, providing multimodal and multi-view audiovisual features. We modified the contrastive audiovisual masked autoencoder to train the cross-modal-based classification model by abundant combinations of audio and visual pairs. In addition, we proposed to sample a compact size of training data for efficient computation. The experiments were evaluated under various data and model configurations, and the results demonstrated that the proposed cross-modal model trained using audio-major sampling can yield 84% accuracy, outperforming the unimodal models of only audio and only visual by 6% and 18%, respectively.
title	Classifying Shelf Life Quality of Pineapples by Combining Audio and Visual Features
topic	Computer Vision and Pattern Recognition Multimedia Sound Audio and Speech Processing
url	https://arxiv.org/abs/2505.11020

Similar Items