Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Selvas-Sala, Cai, Kang, Lei, Gomez, Lluis
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Machine Learning
Online Access:	https://arxiv.org/abs/2603.26316
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915894956916736
author	Selvas-Sala, Cai Kang, Lei Gomez, Lluis
author_facet	Selvas-Sala, Cai Kang, Lei Gomez, Lluis
contents	As multimodal models like CLIP become integral to downstream systems, the need to remove sensitive information is critical. However, machine unlearning for contrastively-trained encoders remains underexplored, and existing evaluations fail to diagnose fine-grained, association-level forgetting. We introduce SALMUBench (Sensitive Association-Level Multimodal Unlearning), a benchmark built upon a synthetic dataset of 60K persona-attribute associations and two foundational models: a Compromised model polluted with this data, and a Clean model without it. To isolate unlearning effects, both are trained from scratch on the same 400M-pair retain base, with the Compromised model additionally trained on the sensitive set. We propose a novel evaluation protocol with structured holdout sets (holdout identity, holdout association) to precisely measure unlearning efficacy and collateral damage. Our benchmark reveals that while utility-efficient deletion is feasible, current methods exhibit distinct failure modes: they either fail to forget effectively or over-generalize by erasing more than intended. SALMUBench sets a new standard for comprehensive unlearning evaluation, and we publicly release our dataset, models, evaluation scripts, and leaderboards to foster future research.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_26316
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	SALMUBench: A Benchmark for Sensitive Association-Level Multimodal Unlearning Selvas-Sala, Cai Kang, Lei Gomez, Lluis Computer Vision and Pattern Recognition Machine Learning As multimodal models like CLIP become integral to downstream systems, the need to remove sensitive information is critical. However, machine unlearning for contrastively-trained encoders remains underexplored, and existing evaluations fail to diagnose fine-grained, association-level forgetting. We introduce SALMUBench (Sensitive Association-Level Multimodal Unlearning), a benchmark built upon a synthetic dataset of 60K persona-attribute associations and two foundational models: a Compromised model polluted with this data, and a Clean model without it. To isolate unlearning effects, both are trained from scratch on the same 400M-pair retain base, with the Compromised model additionally trained on the sensitive set. We propose a novel evaluation protocol with structured holdout sets (holdout identity, holdout association) to precisely measure unlearning efficacy and collateral damage. Our benchmark reveals that while utility-efficient deletion is feasible, current methods exhibit distinct failure modes: they either fail to forget effectively or over-generalize by erasing more than intended. SALMUBench sets a new standard for comprehensive unlearning evaluation, and we publicly release our dataset, models, evaluation scripts, and leaderboards to foster future research.
title	SALMUBench: A Benchmark for Sensitive Association-Level Multimodal Unlearning
topic	Computer Vision and Pattern Recognition Machine Learning
url	https://arxiv.org/abs/2603.26316

Similar Items