Twin-GAN: Unsupervised Image Translation for Human Portraits

ArtificialGen
Aug 20, 2020
3 min read

The author of Twin-GAN is Jerry Li, he was interested in anime but he is not statisfied with his attempts to draw his favorite characters. So, he started doing machine learning to turn human portrait into an original anime character.

But let’s check the previous attempts at teaching AI how to draw.

Neural Style Transfer

· In this approach, the style of one image is applied to another, as you can see below.

· The important notion is that style transfer method requires a trained object detection network.

· Most such networks are trained on real-life objects.

· So, this solution is not likely to help with anime style, unless we create a new dataset manually. But that’s cost us lots of money.

Generative Adversarial Network (GAN)

· GAN is another way to the anime world.

· GAN includes a pair of competing for neural networks that can mimic any data given enough samples, good enough network, and enough time for training.

· Below we can see incredibly realistic faces generated using Progressive Growing of GANs (PGGAN).

· Besides generating pretty high-quality images, GAN is also capable of translating one type of images into another.

· However, this approach requires paired data (one image from each domain), but unfortunately, there is no paired datasets on human and anime potraits.

CycleGAN

· So, before creating Twin-GAN, Jerry Li tried to use CycleGAN for translation of human portraits into anime characters.

· He took 200K images from CelebA dataset as human portraits and around 30K anime figures from Getchu website. Two days of training and he got the results depicted below.

· The results are not bad, but they reveal some limitations of CycleGAN. Let’s not go that deep of limitations of CycleGAN.

Twin-GAN Model

· To solve the issues of previous model the structure of Twin-GAN was finally created.

· PGGAN was chosen as a generator. This network takes a high dimensional vector as its input and in our case an image is an input.

· The researcher used an encoder with structure symmetric to PGGAN to encode the image into the high dimensional vector.

· In order to keep the details of the input image, he used the UNet structure for connecting the convolutional layers in the encoder with the corresponding layers in the generator.

The input and output fall into the following three categories:

1. Human Portrait->Encoder->High Dimensional Vector->PGGAN Generator + human-specific-batch-norm->Human Portrait

2. Anime Portrait->Encoder->High Dimensional Vector->PGGAN Generator + anime-specific-batch-norm->Anime Portrait

3. Human Portrait->Encoder->High Dimensional Vector->PGGAN Generator + anime-specific-batch-norm->Anime Portrait

· The idea behind this structure is that letting the human and anime portraits share the same network will help the network realize that although they look different, both human and anime portraits are describing a face. This is crucial to image translation.

· Here are the results of translating human portraits into anime characters using Twin-GAN.

· Twin-GAN can turn a human portrait into an original anime character, cat face or any character given by the user, and the algorithm demonstrates quite a good performance when completing these tasks.

Limitations of TwinGAN

· It is also prone to some errors like mistaking the background color for hair color, ignoring important features or misrecognizing them.

· When considering anime characters generation, the problem is also with the availability of well-balanced dataset.

· Most of the anime faces collected by the researcher are female, and so the network is prone to translating male human portraits into female anime characters, like on the image below.