Mathematik | Informatik
Henry Rotte, 2001 | Oberwil, BL
Ever since they were introduced in 2014, GANs have been the subject of a considerable amount of research. A common problem that occurs in GANs is mode collapse, which causes the network to produce images of very little diversity. This occurs when there is little overlap between the distribution produced by the GAN and the other modes it wants to approximate. While most attempts to overcome mode collapse have been focused on changing the loss function, this paper proposes an alternative method, which involves the use of a dynamic dataset. The dynamic dataset changes over the course of training to ensure continual overlap between the GAN distribution and the data distribution. The performance of a GAN was tested on a dynamic dataset as well as on a static one. While the static dataset led to failure, the dynamic dataset made it possible for the GAN to avoid mode collapse. Even though the efficacy of this type of dataset modulation is yet to be tested on real world datasets, this paper could function as a starting point for further research, and it provides an alternative to the standard methods of overcoming mode collapse.
A Generative Adversarial Network is a system of two neural networks, the generator and the discriminator, which is used for data generation. While being a fairly new technology, GANs have already found a wide variety of applications. Despite their success, GAN’s still suffer from many problems, most importantly mode collapse, which results in a generator distribution with very small variability. There have been many attempts to overcome mode collapse, with varying success. Most attempts are based on loss modification, as is the case with the Wasserstein GAN. In my experiment, I chose to take a different approach for solving mode collapse and instead focused on modifying the dataset. Mode collapse is widely believed to be caused by the loss function’s inability to learn modes that do not overlap in the metric space. I therefore tested the effect of overlap on mode collapse and tried to find a way of overcoming it.
In order to learn neural network programming, I implemented a GAN and a WGAN and trained them on the MNIST dataset. The GAN experienced mode collapse while the WGAN did not. I then started working on the dataset used in the experiments. It had to be simple and easily adjustable, so I chose to have the samples represent 2D images of multivariate gaussian distributions, whose midpoints were clustered around certain modes. Whithout overlap, the GAN collapsed with datasets of three or more modes. To overcome mode collapse, I changed the dataset from being static to one that evolved during the course of training. Specifically, the dataset started with only one mode, from which the other modes slowly evolved and separated. During the course of training, there were five different steps, after which each of the modes shifted by 2 pixels.
Trained on the dynamic dataset, the GAN was able to produce all four modes and approximate the target distribution, while the control experiment resulted in mode collapse as expected. During the course of training, the GAN was able to approximate the first mode and then follow the other modes as they moved through the metric space.
The experiments demonstrated that the GAN has problems when approximating multiple modes simultaneously. Mode collapse occurs in the standard GAN if the data modes do not overlap with the generator distribution. The problem of the missing overlap was solved by the dynamic dataset. By first training one mode and then slowly changing the form of the training distribution, the GAN always had sufficient overlap between the distribution it generated and the distribution it was trying to approximate. In the discussion of my thesis, there is also a visualization of how the missing overlap leads to saturated gradients and causes the GAN to collapse to one mode. The main problem of the dynamic dataset is that it would be very difficult to implement the same procedure in real-world datasets, which would make it less useful than loss modification, for example.
While the dataset modification may not be a very easy method to implement in real-world AI, it still provides an interesting alternative to the standard ways of overcoming mode collapse. If I were to do further research on dataset modification, I would compare the performance of dynamic dataset GANs and WGANs and try to find ways of making dynamic datasets of real-world data.
Würdigung durch den Experten
Dr. Sepp Kollmorgen
Generative adversarial networks (GANs) sind Algorithmen der künstlichen Intelligenzforschung (KI), welche Wahrscheinlichkeitsverteilungen über natürliche Daten lernen. Die Arbeit vergleicht ein klassisches GAN und ein moderneres Wasserstein-GAN angewendet auf die Modellierung von handgeschriebenen Ziffern. Ergebnisse aus der Literatur werden reproduziert und Unterschiede zwischen beiden Ansätzen werden mithilfe eines neuen, künstlich erzeugten, Datensatzes herausgearbeitet. Die Arbeit zeigt auf elegante Weise, dass der klassische Ansatz besser funktionieren kann wenn das zu lernende Problem in geeigneter Weise in eine Sequenz von Teilproblemen zerlegt wird.
Sonderpreis Paul Scherrer Institut – Forschung auf dem Jungfraujoch
Lehrer: Hans Adrian Schmassmann