Vista Equipo: :: Library Catalog

Guardado en:

Detalles Bibliográficos
Autores principales:	Yeats, Eric, Darwin, Cameron, Ortega, Eduardo, Liu, Frank, Li, Hai
Formato:	Preprint
Publicado:	2024
Materias:	Machine Learning Computer Vision and Pattern Recognition
Acceso en línea:	https://arxiv.org/abs/2404.10588
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

_version_	1866917642273554432
author	Yeats, Eric Darwin, Cameron Ortega, Eduardo Liu, Frank Li, Hai
author_facet	Yeats, Eric Darwin, Cameron Ortega, Eduardo Liu, Frank Li, Hai
contents	We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.
format	Preprint
id	arxiv_https___arxiv_org_abs_2404_10588
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Do Counterfactual Examples Complicate Adversarial Training? Yeats, Eric Darwin, Cameron Ortega, Eduardo Liu, Frank Li, Hai Machine Learning Computer Vision and Pattern Recognition We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.
title	Do Counterfactual Examples Complicate Adversarial Training?
topic	Machine Learning Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2404.10588

Ejemplares similares