# Prunning model with Tensorflow API

- Tram Ho

# Abstract Continuing the Series to improve my knowledge about ML, DL, in this article, I would like to share an article under the topic of Pruning. Still with the reason of surfing Towards Data Science, Medium, I found the article to be too good to share with everyone Along with the strong development of technology and data, Deep Learning has grown stronger and stronger with economic achievements. Respectfully, there are problems with accuracy that surpasses that of humans. Models are getting bigger and bigger, which comes with resource consumption. Needless to say, currently, when you want to deploy Deep Learning for customers, besides accuracy, you must always consider resource consumption. How to solve big problems but must be suitable for current resources. One of the solutions to this problem is the Prunning technique. # What is Pruning? In general terms, Prunning is one of the methods to efficiently respond to Inference for models with smaller size, more memory saving, faster inference with the least possible reduction in accuracy. compared with the original model. In Decision tree, Pruning is a regularization technique to avoid Overfitting, in which, leaf nodes that share a common non-leaf node will be pruned and that non-leaf node will become a leaf node, with the corresponding class class. make up the majority of all points assigned to that node ![image.png](https://images.viblo.asia/f0eacdc1-2f16-4ee5-a2ba-311def9e201e.png) The idea of neural network pruning is taken inspired by the very pruning of neural connections in the human brain, where the complete breakdown and death of neural connections between neurons (axons) occurs between childhood and the onset of puberty. Pruning in a neural network is to remove redundant connections in the network architecture. This pruning actually brings the weight values close to zero to zero to remove unnecessary connections, this pruning will not affect the Inference process There are many different ways to prune the model. It is possible to prune the date from the beginning of some random weight, or it can also be pruned at the end of the training to simplify the model. You must be wondering why a model should be truncated instead of being initialized with fewer parameters from the start. The answer to this question is essentially what you want in the middle of a relatively complex model architecture to train, which covers the data. At the same time, refining layers, reducing or increasing the size of features is an inefficient job. Compared to that, the Pruning model is simple but much more effective. # Prunning with Tensorflow ## Introduction tfmot Tfmot is a tool whose goal is to remove the weakest weights at the end of each training step, and it allows the programmer to define a pruning schedule that will automatically process the removal of weights. This scheduler follows a polynomial decay schedule. Need to pass to the tool parameters such as: * Initial sparsity * Final sparsity * Start pruning step * End pruning step * Exponent of decay (exponent) at the polynomial decay) At each step, the toolkit removes enough weights such that the resulting sparsity is: $$ S = ( S _ { e } – S _ { 0 } ) ( frac { t – t _ { 0 } } { t _ { e } – t _ { 0 } } ) ^ { alpha } $$ Where * $S$ is the sparsity * $S_e$ is the final sparsity * $S_0$ is the initial sparsity * $t$ is the current time step * $t_0$ is the starting time step * $alpha$ is the exponent (default is 3) In addition, other hyperparameters need to be changed to find the optimal value. According to the author’s advice, it is necessary to prune slowly, a little to make the model “adapt” to weight loss, just like cutting a tree, it’s okay to cut a tree =)) ## Deploy pruning with tfmot with a simple example To make it easier to visualize and use tfmot, I will do a little experiment both to understand how to use tfmot and to compare pruning and not to see how the model performance changes. Here, I use sklearn to create datasets, and at the same time use a relatively simple MLP network architecture to compare * Create dataset ` import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn. datasets import make_regression # Parameters of the data-set n_samples = 10000 n_features = 1000 n_informative = 500 noise = 3 # Create dataset and preprocess it x, y = make_regression(n_samples=n_samples, n_features=n_features, n_informative=n_inform)ative, noise, x = x / abs(x).max(axis=0) y = y / abs(y).max() x_train, x_val, y_train, y_val = train_test_s plit(x, y, test_size=0.2, random_state=42) `

` * Create model `

` import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, ReLU model = tf. keras.Sequential() model.add(Dense(1024, kernel_initializer="he_normal", input_dim=n_features)) model.add(ReLU()) model.add(Dense(1024)) model.add(ReLU()) model .add(Dense(1)) `

` * Model Summary ![image.png](https://images.viblo.asia/b589b950-f7a2-4ca1-8c1f-d287051f7d8a.png) With simple network architecture However, the total number of params has also reached more than 2 million, let alone complex network architectures. So, I tested the model training without Pruning and with Pruning to see if there was a significant change in model performance * Training the model without Pruning `

` model.compile( loss="mse", optimizer =tf.keras.optimizers.Adam(learning_rate=0.001) ) history = model.fit( x_train, y_train, validation_data = (x_val, y_val), epochs=200, batch_size=1024, verbose=1 ) `

` ![image .png](https://images.viblo.asia/db256418-b47b-4927-9be8-495ec75e0a33.png) ![image.png](https://images.viblo.asia/3c4e8887-b5c3-4f4f-bc32 -dbdb099fab8a.png) * Training model using Pruning with tfmot tool `

` import tensorflow_model_optimization as tfmot initial_sparsity = 0.0 final_sparsity = 0.75 begin_step = 1000 end_step = 5000 pruning_params = { ‘pruning_nomialspartity’: ptfmot initial_sparsity=initial_sparsity, final_sparsity=final_sparsity, begin_step=begin_step, end_step=end_step) } model = tfmot.sparsity.keras.prune_low_magnitude(model, **pruning_params) pruning_callback = tfm ot.sparsity.keras.UpdatePruningStep() `

` Here using tfmot as a callback, same as learning rate scheduler and early stopping `

` model.compile( loss=”mse”, optimizer=tf.keras.optimizers.Adam (learning_rate=0.001) ) history = model.fit( x_train, y_train, validation_data = (x_val, y_val), epochs=200, batch_size=1024, callbacks= pruning_callback, verbose=1 ) `

`` ![image.png]( https://images.viblo.asia/77bac35c-d8ce-4395-80f2-3e0895f748b2.png) ![image.png](https://images.viblo.asia/4cd07635-02c0-472d-8f34-705d3f32983f.png ) There is a difference, but when using Pruning, the accuracy is not much reduced, val_loss is still acceptable # Conclusion For me personally, Pruning is a method that is quite interested due to its usefulness its use in “lightening” the model. I tested a few math problems with the team and listened to a few seminars about Pruning, and found that the results of the experimental authors brought about surprising results. In parallel with hardware development, we also need to have methods to soften the model down so that hardware or finance, time can keep up =))) # References * [**Pruning Deep Neural Networks**](https://towardsdatascience.com/pruning-deep-neural-network-56cae1ec5505) * [**Model Pruning in Deep Neural Networks Using the TensorFlow API**](https://towardsdatfmottascience.com/ model-pruning-in-deep-neural-networks-using-the-tensorflow-api-7cf52bdd32)`

**Source : ** Viblo