**This is an old revision of the document!**

search?q=canonical&btnI=lucky

# Adversarial Features

**Intent**

**Motivation**

**Sketch**

<Diagram>

**Discussion**

What are Adversarial Features?

Where do they come from?

What problems do they create?

How can we fix them?

What areas need further research?

**Known Uses**

**Related Patterns**

<Diagram>

**References**

**References**

http://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-misconceptions.html

http://karpathy.github.io/2015/03/30/breaking-convnets/

http://arxiv.org/abs/1511.07528 The Limitations of Deep Learning in Adversarial Settings

The detection of adversarial samples remains an open problem. Interestingly, the universal approximation theorem formulated by Hornik et al. states one hidden layer is sufficient to represent arbitrarily accurately a function [21]. Thus, one can intuitively conceive that improving the training phase is key to resisting adversarial samples.

K. Hornik, M. Stinchcombe, et al. Multilayer feedforward networks are universal approximators. Neural networks, 2(5):359–366, 1989.

http://karpathy.github.io/2015/03/30/breaking-convnets/

High regularization gives smoother templates, but at some point starts to works worse. However, it is more resistant to fooling. (The fooling images look noticeably different from their original) Low regularization gives more noisy templates but seems to work better that all-smooth templates. It is less resistant to fooling. Intuitively, it seems that higher regularization leads to smaller weights, which means that one must change the image more dramatically to change the score by some amount. It’s not immediately obvious if and how this conclusion translates to deeper models.

One might hope that ConvNets would produce all-diffuse probabilities in regions outside the training data, but there is no part in an ordinary objective (e.g. mean cross-entropy loss) that explicitly enforces this constraint. Indeed, it seems that the class scores in these regions of space are all over the place, and worse, a straight-forward attempt to patch this up by introducing a background class and iteratively adding fooling images as a new background class during training are not effective in mitigating the problem.

It seems that to fix this problem we need to change our objectives, our forward functional forms, or even the way we optimize our models. However, as far as I know we haven’t found very good candidates for either.

https://arxiv.org/abs/1605.07277

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

we demonstrated how all of these findings could be used to target online classifiers trained and hosted by Amazon and Google, without any knowledge of the model design or parameters, but instead simply by making label queries for 800 inputs. The attack successfully forces these classifiers to misclassify 96.19% and 88.94% of their inputs.

http://www.iep.utm.edu/lp-argue/

In 1961, J.R. Lucas published “Minds, Machines and Gödel,” in which he formulated a controversial anti-mechanism argument. The argument claims that Gödel’s first incompleteness theorem shows that the human mind is not a Turing machine, that is, a computer.

http://goertzel.org/DeepLearning_v1.pdf Are there Deep Reasons Underlying the Pathologies of Today’s Deep Learning Algorithms?

Proposition 2. For a deep learning hierarchy to avoid the brittleness and random images pathologies (on a corpus generated from an image grammar, or on a corpus of natural images), there would need to be a reasonably straightforward mapping from recognizable activity patterns on the different layers, to elements of a reasonably simple image grammar, so that via looking at the activity patterns on each layer when the network was exposed to a certain image, one could read out the “image grammar decomposition” of the elements of the image. For instance, if one applied the deep learning network to a corpus images generated from a commonsensical image grammar, then the deep learning system would need to learn an internal state in reaction to an image, from which the imagegrammar decomposition of the image was easily decipherable.

http://arxiv.org/pdf/1606.05336v1.pdf On the expressive power of deep neural networks

http://arxiv.org/pdf/1511.04599v3.pdf DeepFool: a simple and accurate method to fool deep neural networks

https://arxiv.org/abs/1503.01436v7 Class Probability Estimation via Differential Geometric Regularization

A geometric perspective on overfitting and a regularization approach that exploits the geometry of a robust class probability estimator for classification.

https://arxiv.org/pdf/1511.07528v1.pdf The Limitations of Deep Learning in Adversarial Settings

we describe preliminary work outlining defenses against adversarial samples by defining a predictive measure of distance between a benign input and a target classification.

https://arxiv.org/abs/1607.02533v1 Adversarial examples in the physical world

This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera.

http://arxiv.org/abs/1608.08967v1 Robustness of classifiers: from adversarial to random noise

Our bounds confirm and quantify the empirical observations that classifiers satisfying curvature constraints are robust to random noise.

https://arxiv.org/pdf/1610.08401v1.pdf Universal adversarial perturbations

We showed the existence of small universal perturbations that can fool state-of-the-art classifiers on natural images. We proposed an iterative algorithm to generate universal perturbations, and highlighted several properties of such perturbations. In particular, we showed that universal perturbations generalize well across different classification models, resulting in doubly-universal perturbations (imageagnostic, network-agnostic). We further explained the existence of such perturbations with the correlation between different regions of the decision boundary. This provides insights on the geometry of the decision boundaries of deep neural networks, and contributes to a better understanding of such systems.

https://arxiv.org/abs/1612.00334v2 A Theoretical Framework for Robustness of (Deep) Classifiers Under Adversarial Noise

Experimental results show that Siamese training helps multiple DNN models achieve better accuracy compared to previous defense strategies in an adversarial setting. DNN models after Siamese training exhibit better robustness than the state-of-the-art baselines.

https://arxiv.org/abs/1612.01401v1 Learning Adversary-Resistant Deep Neural Networks

https://arxiv.org/abs/1605.07277 Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples

Many machine learning models are vulnerable to adversarial examples: inputs that are specially crafted to cause a machine learning model to produce an incorrect output. Adversarial examples that affect one model often affect another model, even if the two models have different architectures or were trained on different training sets, so long as both models were trained to perform the same task. An attacker may therefore train their own substitute model, craft adversarial examples against the substitute, and transfer them to a victim model, with very little information about the victim. Recent work has further developed a technique that uses the victim model as an oracle to label a synthetic training set for the substitute, so the attacker need not even collect a training set to mount the attack. We extend these recent techniques using reservoir sampling to greatly enhance the efficiency of the training procedure for the substitute model. We introduce new transferability attacks between previously unexplored (substitute, victim) pairs of machine learning model classes, most notably SVMs and decision trees. We demonstrate our attacks on two commercial machine learning classification systems from Amazon (96.19% misclassification rate) and Google (88.94%) using only 800 queries of the victim model, thereby showing that existing machine learning approaches are in general vulnerable to systematic black-box attacks regardless of their structure.

https://arxiv.org/pdf/1612.00334v3.pdf A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples

Empirically we find that the Siamese architecture can intuitively help DNN models approach topological equivalence between the two feature spaces, which in turns effectively improves its robustness against AN.

http://www.erogol.com/paper-notes-intriguing-properties-neural-networks/

https://openai.com/blog/adversarial-example-research/

https://arxiv.org/pdf/1610.08401v3.pdf Universal adversarial perturbations

We showed the existence of small universal perturbations that can fool state-of-the-art classifiers on natural images. We proposed an iterative algorithm to generate universal perturbations, and highlighted several properties of such perturbations. In particular, we showed that universal perturbations generalize well across different classification models, resulting in doubly-universal perturbations (imageagnostic, network-agnostic). We further explained the existence of such perturbations with the correlation between different regions of the decision boundary. This provides insights on the geometry of the decision boundaries of deep neural networks, and contributes to a better understanding of such systems.

https://arxiv.org/abs/1703.05561v1 Fraternal Twins: Unifying Attacks on Machine Learning and Digital Watermarking

we present a unified notation of black-box attacks against machine learning and watermarking that reveals the similarity of both settings.

https://arxiv.org/pdf/1703.09202.pdf Biologically inspired protection of deep networks from adversarial attacks

In summary, we have shown that a simple, biologically inspired strategy for finding highly nonlinear networks operating in a saturated regime provides interesting mechanisms for guarding DNNs against adversarial examples without ever computing them. Not only do we gain improved performance over adversarially trained networks on adversarial examples generated by the fast gradient sign method, but our saturating networks are also relatively robust against iterative, targeted methods including secondorder adversaries.

https://arxiv.org/abs/1703.09387 Adversarial Transformation Networks: Learning to Generate Adversarial Examples

https://arxiv.org/abs/1706.03922 Analyzing the Robustness of Nearest Neighbors to Adversarial Examples

https://arxiv.org/abs/1707.03501 NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles

In this paper, we show experiments that suggest that current constructions of physical adversarial examples do not disrupt object detection from a moving platform. Instead, a trained neural network classifies most of the pictures taken from different distances and angles of a perturbed image correctly. We believe this is because the adversarial property of the perturbation is sensitive to the scale at which the perturbed picture is viewed, so (for example) an autonomous car will misclassify a stop sign only from a small range of distances. Our work raises an important question: can one construct examples that are adversarial for many or most viewing conditions? If so, the construction should offer very significant insights into the internal representation of patterns by deep networks. If not, there is a good prospect that adversarial examples can be reduced to a curiosity with little practical impact.

https://github.com/bethgelab/foolbox Foolbox is a Python toolbox to create adversarial examples that fool neural networks. It requires Python, NumPy and SciPy.

https://arxiv.org/pdf/1708.03999v1.pdf ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute Models

https://arxiv.org/abs/1708.02582v1 Cascade Adversarial Machine Learning Regularized with a Unified Embedding

we propose to utilize embedding space for both classification and low-level (pixel-level) similarity learning to ignore unknown pixel level perturbation.

We proposed adversarial training regularized with a unified embedding for classification and lowlevel similarity learning by penalizing distance between the clean and their corresponding adversarial embeddings. The networks trained with low-level similarity learning showed higher robustness against one-step and iterative attacks under white box attack.

https://arxiv.org/abs/1707.05373 Houdini: Fooling Deep Structured Prediction Models

Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation.

https://blog.acolyer.org/2017/09/12/universal-adversarial-perturbations/

https://arxiv.org/abs/1710.03337 Standard detectors aren't (currently) fooled by physical adversarial stop signs

In this paper, we show that these physical adversarial stop signs do not fool two standard detectors (YOLO and Faster RCNN) in standard configuration. Evtimov et al.'s construction relies on a crop of the image to the stop sign; this crop is then resized and presented to a classifier. We argue that the cropping and resizing procedure largely eliminates the effects of rescaling and of view angle.

https://openreview.net/pdf?id=S18Su--CW THERMOMETER ENCODING: ONE HOT WAY TO RESIST ADVERSARIAL EXAMPLES

https://arxiv.org/abs/1801.02774 Adversarial Spheres

We hypothesize that this counter intuitive behavior is a naturally occurring result of the high dimensional geometry of the data manifold.

https://arxiv.org/pdf/1802.06627.pdf Robustness of Rotation-Equivariant Networks to Adversarial Perturbations

we investigate the robustness to adversarial attacks of new Convolutional Neural Network architectures providing equivariance to rotations. We found that rotation-equivariant networks are significantly less vulnerable to geometric-based attacks than regular networks on the MNIST, CIFAR-10, and ImageNet datasets.

https://github.com/locuslab/convex_adversarial Provably robust neural networks

https://arxiv.org/abs/1711.10402v2 An Adversarial Neuro-Tensorial Approach For Learning Disentangled Representations

We propose the first unsupervised deep learning method (with pseudo-supervision) for disentangling multiple latent factors of variation in face images captured in-the-wild.

https://arxiv.org/abs/1803.06373 Adversarial Logit Pairing

When applied to clean examples and their adversarial counterparts, logit pairing improves accuracy on adversarial examples over vanilla adversarial training; we also find that logit pairing on clean examples only is competitive with adversarial training in terms of accuracy on two datasets.

https://github.com/VishaalMK/VectorDefense VectorDefense: Vectorization as a Defense to Adversarial Examples

https://arxiv.org/pdf/1805.04874.pdf GAN Q-learning

In this paper, we propose GAN Q-learning, a novel distributional RL method based on generative adversarial networks (GANs) and analyse its performance in simple tabular environments, as well as OpenAI Gym. We empirically show that our algorithm leverages the flexibility and blackbox approach of deep learning models while providing a viable alternative to other state-of-the-art methods.

https://arxiv.org/abs/1805.12152v1 There Is No Free Lunch In Adversarial Robustness (But There Are Unexpected Benefits)

Robust models turn out to have interpretable gradients and feature representations that align unusually well with salient data characteristics. In fact, they yield striking feature interpolations that have thus far been possible to obtain only using generative models such as GANs.

https://arxiv.org/abs/1806.06108v1 Non-Negative Networks Against Adversarial Attacks