Toonify: Turn a portrait into a cartoon character with StyleGAN

Tram Ho

Hello everyone and happy new year !!!

A few days ago, I surfed the FB and found that my boss shared an interesting article on reddit about StyleGAN: Link article

In the comment section, the author explains that it used a technique called model blending to mix two StyleGAN2 models: one was trained on the FFHQ set to produce a lifelike human face, and the second model was finetune. from the above model with the Pixar character dataset. The new model will be able to create Pixar characters from real human faces. Then, the author used to add First Order Motion to animate a new photo according to a sample video.

In this article, I will introduce to everyone the model blending to create animated characters from real images. Let’s get started!

Introduction to GAN and StyleGAN

In this section, I will talk quickly about GAN and StyleGAN but will not talk carefully about the theory.

LIVER

Generative Adversatial Network (GAN) is one of the hottest models currently in deep learning with many applications in the field of imaging. GANs are comprised of 2 competing neural networks called generator and discriminator. The generator’s job is to trick the discriminator network that the image it produces is the real one, and the discriminator network will classify between the real image (from the dataset) and the dummy image (the image from the generator).

The discriminator network will be pre-trained by showing a batch of real images from the dataset and a batch of noise images (the generator is not yet trained).

Then we switch to the train generator. The generator network will learn to generate images with higher quality thanks to feedback from the discriminator (the image it generates is expected to be real or fake) until the discriminator can no longer distinguish between real and fake images. Next we pass the train discrminator and the process continues like this until the generator can generate an image very close to the image in the data set.

StyleGAN

The StyleGAN model was introduced by NVIDIA in 2018. StyleGAN introduces a new generator architecture that allows us to control the level of detail in the image from raw details (head shape, hairstyle …) to limbs. Smaller details (eye color, earrings …).

StyleGAN also integrates techniques from PGGAN , both generator and discrminator networks will initially be trained on a 4×4 image, after many layers will be added and the image size gradually increases. By this technique, the training time is significantly shortened and the training process is also more stable.

StyleGAN has the ability to control levels of granularity by using an additional mapping network that encodes vector z (taken from a standard multidimensional distribution) into a vector w . Vector w will then be put into many different positions in the genertor lattice, at each location vector w will control different features.

The head positions (in grades 4×4, 8×8) control raw features such as head shape, hairstyle, and glasses. The end positions (in layers 512×512, 1024, x1024) control facial texture features such as skin color, hair color, eye color, etc.

Network blending

As written above, the low resolution layers of the model control the structural features of the face, and the layers at high resolution control texture features. of the face. By swapping the model weights at different resolutions, it is possible to select and blend the features generated by different generator networks. An example would be a photo of a real but textured cartoon character’s face.

Swapping paramater

The process of blending 2 generator networks looks like this:

  1. Start with a StyleGAN pretrain model with weight p b a S e p_ {base} p b a s e
  2. Finetune the original model on the newly modeled dataset p t r a n S f e r p_ {transfer} p t r a n s f e r
  3. Combines the weights of the original model and the finetune model into the new weights

Inside, r S w a p r_ {swap} r s w a p is the resolution at which the weights of the two models begins.

Code

Enough theory, let’s start coding! Anyone who wants to try it out without code can see this link . Or a newer version here for a fee.

First, we need to clone the StyleGAN’s repo, here we can use StyleGAN2

This is followed by the load of the pretrain model and load the model. One model is open source by NVIDIA on the FFHQ dataset and another model is the model that has been finetune on an animated character dataset.

Let’s take a look at the outputs of the two original models

Next is the code to blend the two models. Everyone can test with different resolutions to see the results

And this is the result. Not bad either

Just for fun

This section just lets me share a few models after combining. Ukioe-style portraits and “surreal” ukioe paintings

Painting style

Figure drawings

Style ???

Epilogue

Thank you all for taking the time to read the article and wish you all a happy and prosperous New Year

References

https://arxiv.org/pdf/2010.05334.pdf
https://arxiv.org/abs/1812.04948
https://github.com/justinpinkney/stylegan2
https://github.com/justinpinkney/awesome-pretrained-stylegan2
https://github.com/NVlabs/stylegan
https://www.justinpinkney.com/stylegan-network-blending/

Share the news now

Source : Viblo