RaLSGAN for the Motorbike Generator problem in the Zalo Challenge

Tram Ho

Zalo is organizing a contest about Who for all ACE in “Industry”. One of the three problems is the Motorbike Generator and of course its requirement is the same as the Dog Generator post on Kaggle, each output is 128×128 and the Dog Generator is 64×64: v. And I also join in the fun with a spirit of 3H – Ham learn: v. This article I mention my experience in data observation, image processing, as well as model training … This is just my experience in the process of doing and learning, if any. Wrong place to expect people to kick gently ? )

Request math problems

Understand simply to use 10000 that Zalo gives and generates 10000 images with PNG format from the given data set, the evaluation metric is FID.

Theory

To read more about FID evalutation, please read here

I will use RaLSGAN for this problem. So what is RaLSGAN? TL: it is just a normal GAN ​​network but with an optimal loss function

Loss Functions

A discriminator output can be a sigmoid or linear trigger function. If it’s a sigmoid we have a discrete probability distribution of an image that is actually Pr (real) . For linear we have C (x) = logit . Probability is within (0.1). And logit can be any number in the range (0,1). Positive numbers represent real images, while negative numbers represent fake images.

Simple loss

We call x_r as a real image, and x_f is a fake image, we will have D (x) = Pr (Real) or C (x) = Logit will become 2 ouptput of a Discriminator when input is an image, Loss Function will as follows:

We want Discriminator D (x_r) = 1 and D (x_f) = 0 equivalent to real and fake labels, and after the training is complete, the Generator is as close to 1 as possible. In a nutshell, we use real photos as a training data set for Discriminator so that the network can distinguish between real and fake images, and from there Discriminator will give feedback to Generator so that it can improve itself based on feedback. That’s why D (x_r) = 1 and D (x_f) = 0. If you want to learn more about GAN, please read here.

DCGAN Loss

We clearly see Basic GAN and DCGAN using D (x):

RaLSGAN Loss

RaLSGAN uses C (x) = logit:

Code

We will use pytorch to code for this problem, the first is to change the activation function from sigmoid to logit (possibly fishy) on the last line:

Next we will update the loss of G and D:

Data processing

In all DL problems, data is always the most important thing, and the first thing we have to do is to look at the data set and find the characteristics to handle on demand. And as we see our data set includes 10,000 images including:

  • Different sizes and formats
  • Many featured GIF images and errors
  • Data is uneven
  • Many vehicles or obstacles in a photo
  • Most of the images have a horizontal rotation
  • Many vehicles have normal and heterogeneous characteristics with the data set

Treatment:

  • Eliminate leftover images and classify vehicles using yolov3 Here .
  • After removing the leftovers and classifying the car, we will filter the data manually, because the data set includes a lot of diverse vehicles and incorrectly formatted images.
  • Eliminate vehicles with redundant details and few in the data set, generally referred to as non-diverse vehicles with “heterogeneous” characteristics
  • Remove images that have a background that is too colorful

Preprocessing

After finishing the above process to come up with a good data set, all we need to do is to bring the image to the size of 128×128 to put on the network. There are 2 options for this:

  • Padding images: add space inside the image, this space will be added to the width or height of the image without image distortion.
  • Resize images: Bring the image to size 128×128 always and stretch the image by width or height But the problem is that when I finished training and tried both cases and FID evaluation, found padding images for better results, which means more effective learning

Read the path of the image:

Create a loop and call the function to padding Images:

After padding images is finished, we images augmentation. I used image capture and image capture techniques, I had previously increased the contrast but the results were quite poor

Training model

As I said above RaLSGan is a normal GAN ​​network it can be DCGAN, or SGAN … but with a better loss function.

The generator will look like this:

As we can see, the output of G is 128×128 with more than 13 million parameters

Discriminator:

Output

I have tried a lot of cases to get the best results, so it should be put in the range of 550 – 750 epochs is a pretty good FID result from 80 -> 62

Some results:

Reference source

https://www.kaggle.com/c/generative-dog-images/discussion/99485

Share the news now

Source : Viblo