## An Adversarial Attack

*Summary of the paper*

*DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks*

by

by

*Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Pascal Frossard**https://arxiv.org/pdf/1511.04599.pdf*

Link to the paper:

Link to the paper:

### Overview

Deep Neural Networks achieve state of the art performances in many tasks but fail miserably on slightly perturbed images, perturbed in a meaningful way (and not randomly).

The DeepFool paper have the following major contributions:

- Simple and accurate method for computing the robustness of different classifiers to adversarial perturbations.
- Experiments showing

– DeepFool computes a more optimal adversarial perturbation

– Adversarial Training significantly increases the robustness.

### DeepFool for binary classifiers

It can be easily seen using a linear binary classifier, that the robustness of the model (`f`) for an input `x_{0}`

is equal to the distance of `x_{0}`

to the hyperparameter plane (which seperates the 2 classes).

Minimal perturbation to change the classifier’s decision corresponds to the orthogonal projection of `x_{0}`

onto the hyperparameter plane. Given by:

Below is the algorithm for DeepFool for binary classifiers:

*Let’s go over the Algorithm:*

1. The algorithm takes an input `x`

and a classifier `f`

.

2. Outputs the minimal perturbation required to miclassify the image.

3. Initialize the adversarial image with the original input. And the loop variable to 1.

4. Start and continue loop while the true label and the label of the adversarially perturbed image is the same.

5. Calculate the projection of the input onto the closest hyperplane. (minimal perturbation)

6. Add that perturbation to the image and test.

7–8. Increment Loop Variable; End Loop

9. Return the minimal perturbation

### DeepFool for Multiclass Classifiers

With multiclass classifiers, let’s say the input is `x`

and for each class there is a hyperplane (straight plane that divides one class from the others) and based on the place in the space where `x`

lies it is classified into a class. Now, all this algorithm does is, it finds the closest hyperplane, and then projects `x`

onto that hyperplane and pushes it a bit beyond, thus misclassifying it with the minimal perturbation possible. That’s it. (Go through the section 3 in the paper to get a in-depth understanding)

Let’s take a look at the algorithm

*Let’s quickly walkthrough each step of the algorithm:*

1. Input is an image `x`

and the classifier `f`

which is the model.

2. The output which is the perturbation

3. [Blank]

4. We initialize the perturbed image with the original image and the loop variable.

5. We start the iteration and continue for uptil the original label and the perturbed label are not equal.

6–9. We consider `n`

classes that had a the most probability after the original class and we store the minimum difference between the original gradients and the gradients of each of these class (w_{k}) and the difference in the labels (f_{k}).

10. The inner loop stores the minimum w_{k} and f_{k}, and using this we calculate the closest hyperplane for the input `x`

(See Fig 6. for the formula of calculating the closest hyperplane)

11. We calculate the minimal vector that projects `x`

onto the closest hyperplane that we calculated in 10.

12. We add the minimal perturbation to the image and check if its miclassified.

13–14. Loop variable increased; End Loop

15. Return the total perturbation, which is a sum over all the calculated perturbations.

Below is the equation to calculate the closest hyperplane:

where,

variables starting with `f`

are the class labels

variables starting with `w`

are the gradients

where, the varibles with `k`

as subscript are for the classes with the most probability after the true class and the variables with subscript `hat{k}(x+{0})`

is for the true class.

Below is the equation to calculate the minimal perturbation (the vector which projects the input on the closest hyperplane)

#### Adversarial Robustness of a model

The DeepFool algorithm also provides a way to measure the adversarial robustness of an algorithm. It is proided by

# For images with a batch size of 1

num_images = len(tloader))

adversarial_robustness = (1 / num_images) * ((torch.norm(rt.flatten()) / (torch.norm(x.flatten()))

### Experiments

where,

4 — is the fast gradient sign method.

18 — is the attack from the paper titled Intriguing properties of neural networks by Szegedy et al.

Graphs showing the effect of adversarial training using adversarial images generated with the Fast Gradient Sign Method and DeepFool. These graphs demonstrates the importance of training with adversarial examples that are minimally perturbed (and not overly perturbed as in case of the Fast Gradient Sign Method.)