AUROR: Defending Against Poisoning Attacks in Collaborative Deep Learning Systems

Annual Computer Security Applications Conference (ACSAC) |

Deep learning in a collaborative setting is emerging as a cornerstone
of many upcoming applications, wherein untrusted users collaborate
to generate more accurate models. From the security perspective,
this opens collaborative deep learning to poisoning attacks,
wherein adversarial users deliberately alter their inputs to
mis-train the model. These attacks are known for machine learning
systems in general, but their impact on new deep learning systems
is not well-established.
We investigate the setting of indirect collaborative deep learning
— a form of practical deep learning wherein users submit masked
features rather than direct data. Indirect collaborative deep learning
is preferred over direct, because it distributes the cost of computation
and can be made privacy-preserving. In this paper, we
study the susceptibility of collaborative deep learning systems to
adversarial poisoning attacks. Specifically, we obtain the following
empirical results on 2 popular datasets for handwritten images
(MNIST) and traffic signs (GTSRB) used in auto-driving cars. For
collaborative deep learning systems, we demonstrate that the attacks
have 99% success rate for misclassifying specific target data
while poisoning only 10% of the entire training dataset.
As a defense, we propose AUROR, a system that detects malicious
users and generates an accurate model. The accuracy under
the deployed defense on practical datasets is nearly unchanged
when operating in the absence of attacks. The accuracy of a model
trained using AUROR drops by only 3% even when 30% of all the
users are adversarial. AUROR provides a strong guarantee against
evasion; if the attacker tries to evade, its attack effectiveness is
bounded.