Syntax Literate: Jurnal Ilmiah Indonesia p–ISSN: 2541-0849 e-ISSN: 2548-1398
Vol. 7, No. 10, Oktober 2022
COMPARATIVE ANALYSIS OF DCGAN AND WGAN
Syaiful Haq Al Furuqi, Handri Santoso
Teknik Informatika, Universitas Pradita, Indonesia
Email: [email protected],
[email protected]
Abstract
In current years, image recognition is increasingly being used, however, it becomes much less correct if there is not plenty data available while training data. Generative Adversarial Networks (GAN) can assist to create new data that is nearly similar to the original data to help the training process while the original data isn't always much in order that the training process might be more accurate. GAN is currently growing and there are an increasing number of types, along with the Deep Convolutional GAN (DCGAN) and Wasserstein GAN (WGAN) algorithms. This study analyzes the comparison among DCGAN and WGAN which objectives to offer a decision approximately which algorithm is better to use. Based on the research results, DCGAN is simpler however there are still drawbacks, particularly the Mode collapse and vanishing gradient, at the same time as WGAN can remedy those shortcomings however the process is slower.
Keywords: analysis; GAN; DCGAN; WGAN
Introduction
Image recognition is now increasingly being used
in various fields, but it can become less accurate if there is not much data
available for training. To anticipate this, there is an algorithm to create new
data or images that are almost similar to the original, namely Generative
Adversarial Networks (GAN).
GAN was first introduced in 2014 by Goodfellow
in the field of machine learning and is included in the unsupervised learning
algorithm (Kaveri, Meenakshi, Deepan, Dharnish, & Haarish, 2021). GAN uses 2 neural networks or artificial neural networks
consisting of Discriminator and Generator (Praramadhan & Saputra, 2021).
Figure 1
GAN Architecture
The discriminator takes into account the possibilities of the available dataset, which optimizes work to distinguish fake and genuine data.
The generator can sample a noise variable, and is trained to capture the original distributed data so that it can produce data that is as close as possible to the original and can fool the discriminator (Weng, 2019).
There are many types of GAN today, including Deep Convolutional GAN (DCGAN), and Wasserstein GAN (WGAN).
DCGAN is one of the GAN algorithms that was originally introduced in 2016 which is an improvement from the previous algorithm, namely Convolutional GAN (Santosa, Rachmawati, Agung, & Wirayuda, 2021). The difference between DCGAN and the original GAN is the addition of a Convolutional Neural Network (CNN) layer on the discriminator which acts as a classifier of the data generated by the generator. In DCGAN architecture, the pooling layers contained in the discriminator are replaced with strided convolutions and the generator is replaced with fractional-strided convolutions, then use batch normalization applied to the discriminator and generator, then remove the fully connected hidden layers in deeper architectures, and use ReLU activation for all generator layers except for the output layer using Tanh, and using Leaky ReLU activation for all discriminator layers (Widjojo, Palit, & Tjondrowiguno, 2020).
Figure 2
DCGAN
Architecture (Dutta, 2020)
The downside to DCGAN is the Collapse Mode and vanishing gradients. Collapse mode is a term in GAN to describe the inability of GAN to produce different images, for example when training dog data, GAN can only produce 1 dog. Vanishing gradients are caused because the confidence-values of the discriminator only contain a single value, namely "b" or "w" and 0 or 1 (Suran, 2020).
WGAN was first introduced in 2017. WGAN comes with increased stability when training the model and there is a function called wasserstein loss which correlates with the quality of the resulting image. WGAN does not use a discriminator in classifying or predicting the resulting image as genuine or fake, but replaces it with critics to judge the resulting image is genuine or fake.
Figure 3
WGAN Architecture and Computation (Hui, 2018)
The advantage of WGAN is that it is more stable in training models and is not sensitive to model architecture and hyperparameter configuration, then the loss of discriminators affects the image quality created by the generator (Brownlee, 2019), and also WGAN uses a gradient penalty so that discriminators can train more complex data and reduce the problem of vanishing gradients (Lee, Kim, Kim, & Kim, 2020).
WGAN seems to be here to perfect the DCGAN, but behind the advantages of the WGAN, there is a drawback, namely the training process is longer than DCGAN, but comparable to the resulting image.
Research Methodology
Figure
4
Research
Methodology
The first thing to do in this research is to study the literature related to this research, after conducting a literature study, create a DCGAN model, then analyze the results of the model, then create a WGAN model, then analyze the output generated by the model.
Result and Discussion
Figure 5
DCGAN Generator Model
The DCGAN generator model uses a dense layer with 7x7x64 units with relu activation, a reshape layer with a 7x7x64 target shape, 2 Conv2DTranspose layers with 64 and 32 filters, and ends with Conv2D with channel 1 because the image is a grayscale image with tanh activation.
Figure 6
DCGAN Discriminator Model
DCGAN discriminator model, using input layer with input_shape 28,28,1, 2 Conv2D layers with filters 32 and 64, LeakyReLU layer with alpha 0.2, Flatten layer, Dropout layer, Dense layer with unit 1 and activation sigmoid.
The training process uses Adam's optimizer on the model generator and discriminator with a value of 0.0001 and beta_1 0.5 and epoch 150 with each epoch of 60 seconds produces discriminator loss -0.17738741636276245 and generator loss 0.6740452647209167, then the resulting image is as follows
Figure 7
DCGAN Image Result
Figure 8
WGAN Generator Model
The WGAN generator model uses a dense layer with 7*7*64 units with relu activation, a reshape layer with target_shape 7,7,64, 2 Conv2DTranspose layers with 64 and 32 filters and relu activation, and a Conv2DTranspose layer with 1 filter and activation sigmoid.
Figure 9
WGAN Critic Model
The critical WGAN model uses an input layer with an input shape of 28,28,1, 2 Conv2D layers with 32 and 64 filters respectively, a flatten layer, and a Dense layer with unit 1 and sigmoid activation.
The training process uses the Adam optimizer on the generator model with a value of 0.0001 and beta_1 0.5 and RMSprop on the critical model with a value of 0.0005 and applies a gradient penalty and epoch 150 with each 90 second epoch resulting in a discriminator loss -0.05456067994236946 and a generator loss of 0.5148118138313293, the resulting image as follows.
Fig 10
WGAN Image Result
Conclusion
After testing the DCGAN and WGAN models using almost similar models,
it can be concluded that the image quality produced by WGAN is better, but with
a longer model training process of 90 seconds per epoch because WGAN uses a gradient
penalty, while the DCGAN training process faster, namely 60 seconds but with
poor image quality.
Brownlee, Jason. (2019). How to Develop a Wasserstein
Generative Adversarial Network (WGAN) From Scratch. Retrieved April 20, 2022,
from Machine Learning Mastery website: https://machinelearningmastery.com/how-to-code-a-wasserstein-generative-adversarial-network-wgan-from-scratch/#:~:text=The Wasserstein Generative
Adversarial Network,the quality of generated images. Google
Scholar
Dutta, Himanshu. (2020). DCGAN Under 100
Lines of Code. Retrieved April 20, 2020, from Medium website:
https://medium.com/swlh/dcgan-under-100-lines-of-code-fc7fe22c391. Google
Scholar
Hui, Jonathan. (2018). GAN — Wasserstein
GAN & WGAN-GP. Retrieved April 20, 2020, from Medium website:
https://jonathan-hui.medium.com/gan-wasserstein-gan-wgan-gp-6a1a2aa1b490. Google
Scholar
Kaveri, V. Vijeya, Meenakshi, V., Deepan,
T., Dharnish, C. M., & Haarish, S. L. (2021). Image Generation for Real
Time Application Using DCGAN ( Deep Convolutional Generative Adversarial Neural
Network ). Turkish Journal of Computer and Mathematics Education, 12(11),
617–621. Google
Scholar
Lee, Hansoo, Kim, Jonggeun, Kim, Eun
Kyeong, & Kim, Sungshin. (2020). Wasserstein generative adversarial
networks based data augmentation for radar data analysis. Applied Sciences
(Switzerland), 10(4). https://doi.org/10.3390/app10041449. Google
Scholar
Praramadhan, Anugrah Akbar, & Saputra,
Guntur Eka. (2021). Cycle Generative Adversarial Networks Algorithm With
Style Transfer For Image Generation. 1–12. Retrieved from
http://arxiv.org/abs/2101.03921. Google
Scholar
Santosa, Pratama Yoga, Rachmawati, Ema,
Agung, Tjokorda, & Wirayuda, Budi. (2021). Translasi Citra Malam Menjadi
Siang Menggunakan Deep Convolutional Generative Adverserial Network. EProceedings
of Engineering, 8(1). Google
Scholar
Suran, Abhisek. (2020). Deep Convolutional
Vs Wasserstein Generative Adversarial Network. Retrieved April 20, 2022, from
Towards Data Science website: https://towardsdatascience.com/deep-convolutional-vs-wasserstein-generative-adversarial-network-183fbcfdce1f.
Google
Scholar
Weng, Lilian. (2019). From GAN to WGAN. Retrieved from
http://arxiv.org/abs/1904.08994. Google Scholar
Widjojo, Daniel, Palit, Henry Novianus,
& Tjondrowiguno, Alvin Nathaniel. (2020). Menghasilkan Background Game
Music dengan Menggunakan Deep Convolutional Generative Adversarial Network.
XIII. Google
Scholar
Copyright holder: Syaiful Haq Al Furuqi, Handri Santoso (2022) |
First publication right: Syntax Literate: Jurnal Ilmiah Indonesia |
This article is licensed under: |