Saved in:
Bibliographic Details
Main Authors: Weißl, Oliver, Abdellatif, Amr, Chen, Xingcheng, Merabishvili, Giorgi, Riccio, Vincenzo, Kacianka, Severin, Stocco, Andrea
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2408.06258
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912837803180032
author Weißl, Oliver
Abdellatif, Amr
Chen, Xingcheng
Merabishvili, Giorgi
Riccio, Vincenzo
Kacianka, Severin
Stocco, Andrea
author_facet Weißl, Oliver
Abdellatif, Amr
Chen, Xingcheng
Merabishvili, Giorgi
Riccio, Vincenzo
Kacianka, Severin
Stocco, Andrea
contents Evaluating the behavioral boundaries of deep learning (DL) systems is crucial for understanding their reliability across diverse, unseen inputs. Existing solutions fall short as they rely on untargeted random, model- or latent-based perturbations, due to difficulties in generating controlled input variations. In this work, we introduce Mimicry, a novel black-box test generator for fine-grained, targeted exploration of DL system boundaries. Mimicry performs boundary testing by leveraging the probabilistic nature of DL outputs to identify promising directions for exploration. It uses style-based GANs to disentangle input representations into content and style components, enabling controlled feature mixing to approximate the decision boundary. We evaluated Mimicry's effectiveness in generating boundary inputs for five widely used DL image classification systems of increasing complexity, comparing it to two baseline approaches. Our results show that Mimicry consistently identifies inputs closer to the decision boundary. It generates semantically meaningful boundary test cases that reveal new functional (mis)behaviors, while the baselines produce mainly corrupted or invalid inputs. Thanks to its enhanced control over latent space manipulations, Mimicry remains effective as dataset complexity increases, maintaining competitive diversity and higher validity rates, confirmed by human assessors.
format Preprint
id arxiv_https___arxiv_org_abs_2408_06258
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Targeted Deep Learning System Boundary Testing
Weißl, Oliver
Abdellatif, Amr
Chen, Xingcheng
Merabishvili, Giorgi
Riccio, Vincenzo
Kacianka, Severin
Stocco, Andrea
Software Engineering
Machine Learning
Evaluating the behavioral boundaries of deep learning (DL) systems is crucial for understanding their reliability across diverse, unseen inputs. Existing solutions fall short as they rely on untargeted random, model- or latent-based perturbations, due to difficulties in generating controlled input variations. In this work, we introduce Mimicry, a novel black-box test generator for fine-grained, targeted exploration of DL system boundaries. Mimicry performs boundary testing by leveraging the probabilistic nature of DL outputs to identify promising directions for exploration. It uses style-based GANs to disentangle input representations into content and style components, enabling controlled feature mixing to approximate the decision boundary. We evaluated Mimicry's effectiveness in generating boundary inputs for five widely used DL image classification systems of increasing complexity, comparing it to two baseline approaches. Our results show that Mimicry consistently identifies inputs closer to the decision boundary. It generates semantically meaningful boundary test cases that reveal new functional (mis)behaviors, while the baselines produce mainly corrupted or invalid inputs. Thanks to its enhanced control over latent space manipulations, Mimicry remains effective as dataset complexity increases, maintaining competitive diversity and higher validity rates, confirmed by human assessors.
title Targeted Deep Learning System Boundary Testing
topic Software Engineering
Machine Learning
url https://arxiv.org/abs/2408.06258