Secure Adversarial Learning

SAD: Saliency-based Defenses Against Adversarial Examples

Abstract
With the rise in popularity of machine and deep learning models, there is an increased focus on their vulnerability to malicious inputs. These adversarial examples drift model predictions away from the original intent of the network and are a growing concern in practical security. In order to combat these attacks, neural networks can leverage traditional image processing approaches or state-of-the-art defensive models to reduce perturbations in the data. Defensive approaches that take a global approach to noise reduction are effective against adversarial attacks, however their lossy approach often distorts important data within the image. In this work, we propose a visual saliency based approach to cleaning data affected by an adversarial attack. Our model leverages the salient regions of an adversarial image in order to provide a targeted countermeasure while comparatively reducing loss within the cleaned images. We measure the accuracy of our model by evaluating the effectiveness of state-of-the-art saliency methods prior to attack, under attack, and after application of cleaning methods. We demonstrate the effectiveness of our proposed approach in comparison with related defenses and against established adversarial attack methods, across two saliency datasets. Our targeted approach shows significant improvements in a range of standard statistical and distance saliency metrics, in comparison with both traditional and state-of-the-art approaches.

Paper
SAD: Saliency-based Defenses Against Adversarial Examples
Richard Tran, David Patrick, Michael Geyer, Amanda Fernandez
paper | Bibtex

On the Salience of Adversarial Examples

Abstract
Adversarial examples are beginning to evolve as rapidly as the deep learning models they are designed to attack. These intentionally- manipulated inputs attempt to mislead the targeted model while main- taining the appearance of innocuous input data. Global countermeasures against these attacks tend to be lossy to the original data, or ineffective in removing the perturbations. Localized approaches have proven effec- tive, however it is difficult to identify affected areas in the data in order to apply a targeted cleaning algorithm. For image data, visual saliency estimation models identify important features in an image, and provide a targeting mechanism for countering adversarial examples. In this work, we examine the effectiveness of state-of-the-art saliency models on com- plex scenes, in their original and perturbed forms. In a thorough range of common metrics, we compare performance on clean image data with adversarial examples to demonstrate the vulnerability of deep learning- based saliency models to adversarial examples.

Publication
On the Salience of Adversarial Examples
Amanda Fernandez
ISVC 2019 paper | Bibtex