Saved in:
Bibliographic Details
Main Authors: Zehavi, Irad, Nitzan, Roee, Shamir, Adi
Format: Preprint
Published: 2023
Subjects:
Online Access:https://arxiv.org/abs/2301.03118
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913387089231872
author Zehavi, Irad
Nitzan, Roee
Shamir, Adi
author_facet Zehavi, Irad
Nitzan, Roee
Shamir, Adi
contents In this paper, we describe how to plant novel types of backdoors in any facial recognition model based on the popular architecture of deep Siamese neural networks. These backdoors force the system to err only on natural images of specific persons who are preselected by the attacker, without controlling their appearance or inserting any triggers. For example, we show how such a backdoored system can classify any two images of a particular person as different people, or any two images of a particular pair of persons as the same person, with almost no effect on the correctness of its decisions for other persons. Surprisingly, we show that both types of backdoors can be implemented by applying linear transformations to the model's last weight matrix, with no additional training or optimization, using only images of the backdoor identities. A unique property of our attack is that multiple backdoors can be independently installed in the same model by multiple attackers, who may not be aware of each other's existence, with almost no interference. We have experimentally verified the attacks on a SOTA facial recognition system. When we tried to individually anonymize ten celebrities, the network failed to recognize two of their images as being the same person in $97.02\%$ to $98.31\%$ of the time. When we tried to confuse between the extremely different-looking Morgan Freeman and Scarlett Johansson, for example, their images were declared to be the same person in $98.47 \%$ of the time. For each type of backdoor, we sequentially installed multiple backdoors with minimal effect on the performance of each other (for example, anonymizing all ten celebrities on the same model reduced the success rate for each celebrity by no more than $1.01\%$). In all of our experiments, the benign accuracy of the network on other persons barely degraded (in most cases, it degraded by less than $0.05\%$).
format Preprint
id arxiv_https___arxiv_org_abs_2301_03118
institution arXiv
publishDate 2023
record_format arxiv
spellingShingle Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons
Zehavi, Irad
Nitzan, Roee
Shamir, Adi
Cryptography and Security
Machine Learning
In this paper, we describe how to plant novel types of backdoors in any facial recognition model based on the popular architecture of deep Siamese neural networks. These backdoors force the system to err only on natural images of specific persons who are preselected by the attacker, without controlling their appearance or inserting any triggers. For example, we show how such a backdoored system can classify any two images of a particular person as different people, or any two images of a particular pair of persons as the same person, with almost no effect on the correctness of its decisions for other persons. Surprisingly, we show that both types of backdoors can be implemented by applying linear transformations to the model's last weight matrix, with no additional training or optimization, using only images of the backdoor identities. A unique property of our attack is that multiple backdoors can be independently installed in the same model by multiple attackers, who may not be aware of each other's existence, with almost no interference. We have experimentally verified the attacks on a SOTA facial recognition system. When we tried to individually anonymize ten celebrities, the network failed to recognize two of their images as being the same person in $97.02\%$ to $98.31\%$ of the time. When we tried to confuse between the extremely different-looking Morgan Freeman and Scarlett Johansson, for example, their images were declared to be the same person in $98.47 \%$ of the time. For each type of backdoor, we sequentially installed multiple backdoors with minimal effect on the performance of each other (for example, anonymizing all ten celebrities on the same model reduced the success rate for each celebrity by no more than $1.01\%$). In all of our experiments, the benign accuracy of the network on other persons barely degraded (in most cases, it degraded by less than $0.05\%$).
title Facial Misrecognition Systems: Simple Weight Manipulations Force DNNs to Err Only on Specific Persons
topic Cryptography and Security
Machine Learning
url https://arxiv.org/abs/2301.03118