Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.17938 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914577347772416 |
|---|---|
| author | Serrà, Joan Goswami, Dipam Morreale, Fabio Liao, Wei-Hsiang Mitsufuji, Yuki |
| author_facet | Serrà, Joan Goswami, Dipam Morreale, Fabio Liao, Wei-Hsiang Mitsufuji, Yuki |
| contents | Training data attribution (TDA) should enable generative model interpretability and foster a variety of related downstream tasks. Nonetheless, current TDA approaches lack reliability and robustness, preventing their adoption in real-world setups. In this paper, we take a decisive step towards more reliable and robust TDA for diffusion models. We propose to perform TDA with mirrored unlearning and noise-consistent skew (MUCS). The idea is to fine-tune a second model with bounded mirrored gradient ascent, and to measure the normalized skew of this model with respect to the original one using consistent noise samples. We show that, while being conceptually simple and generic, MUCS systematically outperforms existing methods on three different datasets by a large margin. We additionally study the effect that core design choices have on final performance, and analyze novel aspects regarding the overlap of influential instances across generated items and the potential of ensembling TDA approaches. We believe that our findings may have broader implications for more general unlearning setups, as well as for tasks requiring the comparison of diffusion losses. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_17938 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew Serrà, Joan Goswami, Dipam Morreale, Fabio Liao, Wei-Hsiang Mitsufuji, Yuki Machine Learning Artificial Intelligence Training data attribution (TDA) should enable generative model interpretability and foster a variety of related downstream tasks. Nonetheless, current TDA approaches lack reliability and robustness, preventing their adoption in real-world setups. In this paper, we take a decisive step towards more reliable and robust TDA for diffusion models. We propose to perform TDA with mirrored unlearning and noise-consistent skew (MUCS). The idea is to fine-tune a second model with bounded mirrored gradient ascent, and to measure the normalized skew of this model with respect to the original one using consistent noise samples. We show that, while being conceptually simple and generic, MUCS systematically outperforms existing methods on three different datasets by a large margin. We additionally study the effect that core design choices have on final performance, and analyze novel aspects regarding the overlap of influential instances across generated items and the potential of ensembling TDA approaches. We believe that our findings may have broader implications for more general unlearning setups, as well as for tasks requiring the comparison of diffusion losses. |
| title | Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2605.17938 |