Guardado en:
Detalles Bibliográficos
Autores principales: Yeats, Eric, Darwin, Cameron, Ortega, Eduardo, Liu, Frank, Li, Hai
Formato: Preprint
Publicado: 2024
Materias:
Acceso en línea:https://arxiv.org/abs/2404.10588
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
_version_ 1866917642273554432
author Yeats, Eric
Darwin, Cameron
Ortega, Eduardo
Liu, Frank
Li, Hai
author_facet Yeats, Eric
Darwin, Cameron
Ortega, Eduardo
Liu, Frank
Li, Hai
contents We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.
format Preprint
id arxiv_https___arxiv_org_abs_2404_10588
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Do Counterfactual Examples Complicate Adversarial Training?
Yeats, Eric
Darwin, Cameron
Ortega, Eduardo
Liu, Frank
Li, Hai
Machine Learning
Computer Vision and Pattern Recognition
We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.
title Do Counterfactual Examples Complicate Adversarial Training?
topic Machine Learning
Computer Vision and Pattern Recognition
url https://arxiv.org/abs/2404.10588