Guardado en:
Detalles Bibliográficos
Autores principales: Min, Yue, Wang, Shaobo, Li, Jiaze, Niu, Tianle, Fan, Junxin, Miao, Yongliang, Yang, Lijin, Zhang, Linfeng
Formato: Preprint
Publicado: 2025
Materias:
Acceso en línea:https://arxiv.org/abs/2511.08263
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866911259493924864
author Min, Yue
Wang, Shaobo
Li, Jiaze
Niu, Tianle
Fan, Junxin
Miao, Yongliang
Yang, Lijin
Zhang, Linfeng
author_facet Min, Yue
Wang, Shaobo
Li, Jiaze
Niu, Tianle
Fan, Junxin
Miao, Yongliang
Yang, Lijin
Zhang, Linfeng
contents Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimodal scenarios where preserving intricate inter-modal dependencies is crucial. To address this, we introduce ImageBindDC, a novel data condensation framework operating within the unified feature space of ImageBind. Our approach moves beyond conventional distribution-matching by employing a powerful Characteristic Function (CF) loss, which operates in the Fourier domain to facilitate a more precise statistical alignment via exact infinite moment matching. We design our objective to enforce three critical levels of distributional consistency: (i) uni-modal alignment, which matches the statistical properties of synthetic and real data within each modality; (ii) cross-modal alignment, which preserves pairwise semantics by matching the distributions of hybrid real-synthetic data pairs; and (iii) joint-modal alignment, which captures the complete multivariate data structure by aligning the joint distribution of real data pairs with their synthetic counterparts. Extensive experiments highlight the effectiveness of ImageBindDC: on the NYU-v2 dataset, a model trained on just 5 condensed datapoints per class achieves lossless performance comparable to one trained on the full dataset, achieving a new state-of-the-art with an 8.2\% absolute improvement over the previous best method and more than 4$\times$ less condensation time.
format Preprint
id arxiv_https___arxiv_org_abs_2511_08263
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
Min, Yue
Wang, Shaobo
Li, Jiaze
Niu, Tianle
Fan, Junxin
Miao, Yongliang
Yang, Lijin
Zhang, Linfeng
Computer Vision and Pattern Recognition
Artificial Intelligence
Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimodal scenarios where preserving intricate inter-modal dependencies is crucial. To address this, we introduce ImageBindDC, a novel data condensation framework operating within the unified feature space of ImageBind. Our approach moves beyond conventional distribution-matching by employing a powerful Characteristic Function (CF) loss, which operates in the Fourier domain to facilitate a more precise statistical alignment via exact infinite moment matching. We design our objective to enforce three critical levels of distributional consistency: (i) uni-modal alignment, which matches the statistical properties of synthetic and real data within each modality; (ii) cross-modal alignment, which preserves pairwise semantics by matching the distributions of hybrid real-synthetic data pairs; and (iii) joint-modal alignment, which captures the complete multivariate data structure by aligning the joint distribution of real data pairs with their synthetic counterparts. Extensive experiments highlight the effectiveness of ImageBindDC: on the NYU-v2 dataset, a model trained on just 5 condensed datapoints per class achieves lossless performance comparable to one trained on the full dataset, achieving a new state-of-the-art with an 8.2\% absolute improvement over the previous best method and more than 4$\times$ less condensation time.
title ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2511.08263