Saved in:
Bibliographic Details
Main Authors: Letellier, Guillaume, Srivastava, Siddharth, Jurie, Frédéric, Sharma, Gaurav
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2511.20721
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866915892116324352
author Letellier, Guillaume
Srivastava, Siddharth
Jurie, Frédéric
Sharma, Gaurav
author_facet Letellier, Guillaume
Srivastava, Siddharth
Jurie, Frédéric
Sharma, Gaurav
contents Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computational cost make them prohibitive for deployment on edge devices such as robots and AR/VR headsets. Existing compression techniques like standard knowledge distillation create efficient 'specialist' models but sacrifice the crucial, downstream-agnostic generality that makes foundation models so valuable. In this paper, we introduce Foundation Model Distillation (FMD), a new paradigm for compressing large SSL models into compact, efficient, and faithful proxies that retain their general-purpose representational power. We present Foundry, the first implementation of FMD for 3D point clouds. Our approach, Foundry, trains a student to learn a compressed set of SuperTokens that reconstruct the teacher's token-level representations, capturing a compact basis of its latent space. A single distilled model maintains strong transferability across diverse downstream tasks-classification, part segmentation, and few-shot scenarios-approaching full foundation-model performance while using significantly fewer tokens and FLOPs, making such models more practical for deployment on resourceconstrained hardware.
format Preprint
id arxiv_https___arxiv_org_abs_2511_20721
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Foundry: Distilling 3D Foundation Models for the Edge
Letellier, Guillaume
Srivastava, Siddharth
Jurie, Frédéric
Sharma, Gaurav
Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
Neural and Evolutionary Computing
Foundation models pre-trained with self-supervised learning (SSL) on large-scale datasets have become powerful general-purpose feature extractors. However, their immense size and computational cost make them prohibitive for deployment on edge devices such as robots and AR/VR headsets. Existing compression techniques like standard knowledge distillation create efficient 'specialist' models but sacrifice the crucial, downstream-agnostic generality that makes foundation models so valuable. In this paper, we introduce Foundation Model Distillation (FMD), a new paradigm for compressing large SSL models into compact, efficient, and faithful proxies that retain their general-purpose representational power. We present Foundry, the first implementation of FMD for 3D point clouds. Our approach, Foundry, trains a student to learn a compressed set of SuperTokens that reconstruct the teacher's token-level representations, capturing a compact basis of its latent space. A single distilled model maintains strong transferability across diverse downstream tasks-classification, part segmentation, and few-shot scenarios-approaching full foundation-model performance while using significantly fewer tokens and FLOPs, making such models more practical for deployment on resourceconstrained hardware.
title Foundry: Distilling 3D Foundation Models for the Edge
topic Computer Vision and Pattern Recognition
Artificial Intelligence
Machine Learning
Neural and Evolutionary Computing
url https://arxiv.org/abs/2511.20721