Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chopin, Jeremy, Dahyot, Rozenn
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2410.13421
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909353359966208
author	Chopin, Jeremy Dahyot, Rozenn
author_facet	Chopin, Jeremy Dahyot, Rozenn
contents	Data embeddings with CLIP and ImageBind provide powerful features for the analysis of multimedia and/or multimodal data. We assess their performance here for classification using a Gaussian Mixture models (GMMs) based layer as an alternative to the standard Softmax layer. GMMs based classifiers have recently been shown to have interesting performances as part of deep learning pipelines trained end-to-end. Our first contribution is to investigate GMM based classification performance taking advantage of the embedded spaces CLIP and ImageBind. Our second contribution is in proposing our own GMM based classifier with a lower parameters count than previously proposed. Our findings are, that in most cases, on these tested embedded spaces, one gaussian component in the GMMs is often enough for capturing each class, and we hypothesize that this may be due to the contrastive loss used for training these embedded spaces that naturally concentrates features together for each class. We also observed that ImageBind often provides better performance than CLIP for classification of image datasets even when these embedded spaces are compressed using PCA.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_13421
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces Chopin, Jeremy Dahyot, Rozenn Computer Vision and Pattern Recognition Data embeddings with CLIP and ImageBind provide powerful features for the analysis of multimedia and/or multimodal data. We assess their performance here for classification using a Gaussian Mixture models (GMMs) based layer as an alternative to the standard Softmax layer. GMMs based classifiers have recently been shown to have interesting performances as part of deep learning pipelines trained end-to-end. Our first contribution is to investigate GMM based classification performance taking advantage of the embedded spaces CLIP and ImageBind. Our second contribution is in proposing our own GMM based classifier with a lower parameters count than previously proposed. Our findings are, that in most cases, on these tested embedded spaces, one gaussian component in the GMMs is often enough for capturing each class, and we hypothesize that this may be due to the contrastive loss used for training these embedded spaces that naturally concentrates features together for each class. We also observed that ImageBind often provides better performance than CLIP for classification of image datasets even when these embedded spaces are compressed using PCA.
title	Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2410.13421

Similar Items