Dropout Techniques (Dropout) in Deep Learning

Tram Ho

In this article, I would like to introduce Dropout (Dropout) in Neural network, then I will have some code to see how Dropout affects the performance of Neural network.

1. Theory

1.1. What is a dropout in a neural network?

According to Wikipedia – The term ‘Dropout’ refers to the ignoring of hidden and visible units in a Neural network.

Understand a simple way, the Dropout is ignoring the unit (ie a network node) in the training process randomly. By omitting this unit, the unit will not be considered during forward and backward. Accordingly, p is called the probability of retaining a network node during each training period, so the probability that it will be rejected is (1 – p).

1.2. Why need Dropout

The question is: Why do I have to literally turn off some network nodes during training? The answer is: Avoid Over-fitting.

If a fully connected class has too many parameters and takes up most of the parameters, the network nodes in that class are too interdependent during training, limiting each node’s power, leading to over-coupling .

1.3. Other techniques

If you want to know what Dropout is, just the above 2 theory parts are enough. In this part I also introduce a number of techniques that have the same effect with Dropout.

In Machine Learning, regularization reduces over-fitting by adding a range of ‘penalties’ to the loss function. By adding such a value, your model won’t learn too much of dependencies between the weights. Surely many people who know Logistic Regression know that L1 (Laplacian) and L2 (Gaussian) are two ‘penalty’ techniques.

  • Training process: For each hidden layer, example, per loop, we will drop out randomly with probability (1 – p) for each network node.
  • Test Process: Use all triggers, but will decrease by 1 p-factor (to account for dropped actives).

1.4. Some comments

  • Dropout will learn more powerful useful features
  • It almost doubles the number of epochs needed to converge. However, the time per epoch is less.
  • We have H hidden units, with the dropout probability for each unit of (1 – p) we can have 2 ^ H possible models. But during the test phase, all network nodes must be considered, and each activation will be reduced by a factor p.

2. Practice

Talking is a bit confusing, so I will code 2 parts to see what Dropout is like.

Problem: You go to a football match and you try to predict where the goalkeeper takes a shot and the home player hits the ball.

I imported the necessary libraries

Visualize the data a bit

We get results

The red dot is the home player who has hit his head, the green dot is the player you hit. What we do is predict which area the goalkeeper should shoot the ball into so that the home player can hit his head. Looks like you only need to draw a line to divide the 2 areas.

2.1. The model does not have formalization

Prediction function

See the results

It can be seen that the training accuracy is 94% and the test set is 91% (quite high). We’ll visualize a bit

When there is no formalization, we see a very detailed draw line, that is, it is over-fitting.

2.2. Regularized model with Dropout

2.2.1. Forward Propagation process

2.2.2. Backward Propagation process

After having Forward and Backward, we replace these 2 functions in the model function of the previous section:

Result:

We see, the test set accuracy was up to 95%, although the training training was reduced. Perform visualize:

We can

We see that the dividing line is not too detailed, so over-fitting is avoided.

2.3. Attention

  • Do not use Dropout for the test
  • Apply Dropout for both Forward and Backward
  • The trigger value must be reduced by 1 keep_prob factor, including dropout nodes.

Source: Medium

Thank you for viewing the article

Share the news now

Source : Viblo