Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Min, Yue, Wang, Shaobo, Li, Jiaze, Niu, Tianle, Fan, Junxin, Miao, Yongliang, Yang, Lijin, Zhang, Linfeng
Formato:	Preprint
Publicado:	2025
Materias:	Computer Vision and Pattern Recognition Artificial Intelligence
Acceso en línea:	https://arxiv.org/abs/2511.08263
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866911259493924864
author	Min, Yue Wang, Shaobo Li, Jiaze Niu, Tianle Fan, Junxin Miao, Yongliang Yang, Lijin Zhang, Linfeng
author_facet	Min, Yue Wang, Shaobo Li, Jiaze Niu, Tianle Fan, Junxin Miao, Yongliang Yang, Lijin Zhang, Linfeng
contents	Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimodal scenarios where preserving intricate inter-modal dependencies is crucial. To address this, we introduce ImageBindDC, a novel data condensation framework operating within the unified feature space of ImageBind. Our approach moves beyond conventional distribution-matching by employing a powerful Characteristic Function (CF) loss, which operates in the Fourier domain to facilitate a more precise statistical alignment via exact infinite moment matching. We design our objective to enforce three critical levels of distributional consistency: (i) uni-modal alignment, which matches the statistical properties of synthetic and real data within each modality; (ii) cross-modal alignment, which preserves pairwise semantics by matching the distributions of hybrid real-synthetic data pairs; and (iii) joint-modal alignment, which captures the complete multivariate data structure by aligning the joint distribution of real data pairs with their synthetic counterparts. Extensive experiments highlight the effectiveness of ImageBindDC: on the NYU-v2 dataset, a model trained on just 5 condensed datapoints per class achieves lossless performance comparable to one trained on the full dataset, achieving a new state-of-the-art with an 8.2\% absolute improvement over the previous best method and more than 4$\times$ less condensation time.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_08263
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation Min, Yue Wang, Shaobo Li, Jiaze Niu, Tianle Fan, Junxin Miao, Yongliang Yang, Lijin Zhang, Linfeng Computer Vision and Pattern Recognition Artificial Intelligence Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimodal scenarios where preserving intricate inter-modal dependencies is crucial. To address this, we introduce ImageBindDC, a novel data condensation framework operating within the unified feature space of ImageBind. Our approach moves beyond conventional distribution-matching by employing a powerful Characteristic Function (CF) loss, which operates in the Fourier domain to facilitate a more precise statistical alignment via exact infinite moment matching. We design our objective to enforce three critical levels of distributional consistency: (i) uni-modal alignment, which matches the statistical properties of synthetic and real data within each modality; (ii) cross-modal alignment, which preserves pairwise semantics by matching the distributions of hybrid real-synthetic data pairs; and (iii) joint-modal alignment, which captures the complete multivariate data structure by aligning the joint distribution of real data pairs with their synthetic counterparts. Extensive experiments highlight the effectiveness of ImageBindDC: on the NYU-v2 dataset, a model trained on just 5 condensed datapoints per class achieves lossless performance comparable to one trained on the full dataset, achieving a new state-of-the-art with an 8.2\% absolute improvement over the previous best method and more than 4$\times$ less condensation time.
title	ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
topic	Computer Vision and Pattern Recognition Artificial Intelligence
url	https://arxiv.org/abs/2511.08263

Ejemplares similares