Attacking a neural network with Fast Gradient Sign Attack using PyTorch

Tram Ho


Hello all of you. Recently, the number of articles related to neural networks or other topics on Deep Learning has become quite popular on Viblo in particular and the tech blog community in general. We have written a lot about the amazing applications of neural networks, and we have written a lot about different network architectures, solving numerous problems from image classification to self-driving cars … It is fair to say that AI in general or Deep Learning in particular is just a technology, it is a small part of the large software technology picture. Therefore, when it comes to a software engineering system, in addition to learning about the construction techniques, creating software products, the research and study of attack methods and attack prevention those systems. Deep Learning has flourished in the last decade with the advancement of hardware, a series of new models born every year that bring Deep Learning applications closer to real life. But with that, the methods of information security also need to be focused on because there is a common sense when using AI applications that is understand what the machine does? . AI is like a black box to the user, what confirms that in every case it makes the right decision

Therefore, the study of methods of attacking neural networks is also a very hot topic. In this article, we will explore a classic method of attack on neural network, which is Fast Gradient Sign Attack . We will implement it with the PyTorch framework as well as discuss the topic. OK let’s start now

What is neural network attack

Like many other information systems, when there are good, fast and accurate modeling methods, there will also be methods to sabotage, attack, and deceive our models. These are the two extremes that exist in parallel and never stop being hot. There are many types of neural network attacks depending on the target as well as assumptions about the attacker’s knowledge of our model. In general, most attackers want to somehow influence the model or input data to make the model make false judgments. Based on the attacker’s knowledge of the network can be divided into two ways: white-box and black-box . This is like a concept in software testing

  • White box attacker : is an attacker with full knowledge of input, output, network architecture, order of layers, activation activations, trained weights and full access Access, change those parameters. An attacker of this type can gain full use rights and make the model work according to his wishes.
  • The black box attacker is an attacker who only knows information based on network inputs and outputs without knowing anything about the internal architecture of the model or the trained weights. Attacks will focus on changing input data to trick the AI ​​model. The purpose of these attackers is also divided into two main categories: misclassification and source / target misclassification . In the case of misclassification , the attacker’s purpose is only to cause the model to fail, regardless of the output. For example, if the cat was put in the image, the model did not recognize that it was a cat, so it successfully attacked. As for source / target misclassification , it raises the level of attack to a new level. For example, putting a cat image into the forced output of the model will identify it as a dog.

In today’s article we will learn about the Fast Gradient Sign Attack – FGSA method of a white-box attack with the purpose of misclassification . With that background knowledge, we will dive into the details in the next sections

Fast Gradient Sign Attack

This attack was first described in 2015 in an article by Goodfellow Explaining and Harnessing Adversarial Examples . This paper points out that most neural networks can be deceived by adversarial examples of countervailing patterns generated by deliberately adding a small amount of noise in the input image that makes the network misjudged. This method of attack is powerful enough to deceive systems built on neural networks and is fairly intuitive to explain. Its basic idea is as follows:

  • The conventional model will change the weight through the optimization process using gradient descent with the gradient value calculated in the backpropagation step. The process of updating this weight will minimize the loss function
  • To attack, instead of updating the model number and minimizing the loss function, we will change the input data in order to maximize the loss function.

Check out the panda taxonomy example to see some concepts clearly

Inside x x x is the input image, y y y is the label of the image x x x . J ( θ , x , y ) J ( mathbf { theta}, mathbf {x}, y) J ( θ , x , y ) is the loss function used for training models θ theta θ . The training process uses backpropagation to calculate the derivative and the gradient descent algorithm will base on the derivative values ​​to change the input. x x x simultaneously calculates the change of the loss function x J ( θ , x , y ) nabla_ {x} J ( mathbf { theta}, mathbf {x}, y) x J ( θ , x , y ) . When changing input x x x amount ϵ epsilon ϵ very small in the direction of maximize loss S i g n ( x J ( θ , x , y ) ) sign ( nabla_ {x} J ( mathbf { theta}, mathbf {x}, y)) s i g n (∇ x J ( θ , x , y ) ) then the value x x ‘ x will be misclassified as gibbon classs as in the picture

The main idea of ​​the algorithm is thus. Now we proceed to the code only

Implement FGSA Attack

Import the necessary libraries

In this article we use PyTorch to deploy. Like the other computer vision problems on PyTorch we import a few necessary packages as follows

Define inputs

To train this model, it is necessary to use 3 input parameters as follows:

Where the epsilons parameter is a list of values ​​representing the noise level added to the model’s input. A value of 0 corresponds to keeping the input of the test set intact. The higher the epsilon value, the greater the degree of disturbance. We will compare how these levels of turbulence correspond to the accuracy of the model being reduced. The model is defined for the MNIST episode and you can download the pretrained weigth here

Model and data definitions

We use the pre-trained LeNet model for the MNIST dataset

Then the next thing to do is define the test data using the DataToader module of PyTorch

Load pretrained model

After defining the model and data we proceed to load pretrained weight that has been downloaded earlier

Attack FGSA

Next we need to define the attack function by changing the network’s input inputs with corresponding epsilon weights. The function fgsm_attack takes three corresponding parameters as follows:

  • image : input image input
  • epsilon corresponding value represents item of attack
  • data_grad is the value of the gradient of the loss function corresponding to the data input image x J ( θ , x , y ) nabla_ {x} J ( mathbf { theta}, mathbf {x}, y) x J ( θ , x , y ) .

The value of the image is changed via fgsa_attack as follows:

p e r t u r b e d _ i m a g e = i m a g e + e p S i l o n S i g n ( d a t a _ g r a d ) = x + ϵ S i g n ( x J ( θ , x , y ) ) perturbed _image = image + epsilon * sign (data _grad) = x + epsilon * sign ( nabla_ {x} J ( mathbf { theta}, mathbf {x}, y)) p e r t u r b e d _ i m a g e = i m a g e + e p s i l o n * s i g n ( d a t a _ g r a d ) = x + ϵ s i g n (∇ x J ( θ , x , y ) )

And finally, the new value of the input data will be returned in the range ( 0 , first ) (0, 1) ( 0 , 1 )

Details of the function are as follows

Finally we define the test function. Each call to this test function will go through all the samples in the test set of MNIST and make predictions with the model that was attacked by FGSA. To perform an attack, the derivative of the loss function corresponding to the input data will be calculated for each sample in the test set. This will then be passed to the fgsm_attack function above to generate a new input_data . Details of this function are as follows

Then proceed to attack with different epsilon values

We get the results

Analysis of attack results

We can see that with increasing epsilon values ​​accuracy will decrease accordingly. This is explained simply in the formula of the loss function. With a larger epsilon value, the change in the value of the loss function in each attack step will also be greater, resulting in a change in the accuracy of the model is larger. The change of accuracy is not linear even though the values epsilon is linear. We can see the correlation of accuracy and epsilon prices with the following code

And here is its output

You will be wondering that if you add a lot of noise to the input data, the accuracy of the model will decrease, which means that the effectiveness of the attack will increase. However, it is a fact that the more noise is added, the higher the probability of being detected interfering with the input data. Specifically, we will print a few sample images at high epsilon values ​​to see the difference

So to deceive the model requires trade-offs. If the input data easily deceives the model, it is also easily recognized by humans.


This article helps you learn about a popular neural network attack that is Fast Gradient Sign Attack as well as analysis based on this attack method. That gives us more mindset about making our neural networks safer. Actually there are many other attack methods you can refer to in paper Adversarial Attacks and Defences Competition . Hello and see you in the following article

Share the news now

Source : Viblo