Mathematik  |  Informatik


Mitja Proença, 2005 | Plasselb, FR


With artificial intelligence being increasingly influential in society and frequently entering public debate, the concept itself can often be seen being misunderstood and mythologised. This paper aims to introduce machine learning, a subfield of artificial intelligence. Furthermore, the paper explores the connection between machine learning and information theory. Through this connection, it derives the cross entropy cost function and its use in neural networks. The cross entropy cost function is then compared with another cost function used for learning in neural networks. The comparison encompasses empirical testing on an optical character recognition dataset with different hyperparameters as well as a theoretical analysis to explain the results.


How does the cross entropy (CE) cost function compare to the mean squared error (MSE) cost function and why?


The empirical testing was completed with a modular implementation of a neural network in python using no machine learning libraries. The libraries numpy, pandas, time and csv were used for linear algebra operations, data handling, time measuring and file handling respectively. Different neural networks with different architectures and hyperparameters were then trained, with the CE cost function and the MSE cost function respectively, on the MNIST dataset and tested on separate testing data to take overfitting into account. The average over 100 training sessions of the accuracy as well as the computation time at each iteration of the training process was recorded.


The CE cost function outperformed the MSE cost function in all architectures on both datasets. For example, to reach an accuracy of 80% on the MNIST dataset one architecture with MSE needed 1606 iterations corresponding to 290.7 ms while the equivalent architecture with CE only needed 169 iterations corresponding to 31.05 ms, which is more than nine times faster.


The method of comparison took overfitting and a possibly high variance in results into account by testing the neural networks on separate testing data and taking the average over 100 samples. A theoretical comparison was made to explain the observed results. However, the comparison could be quantitatively expanded upon, by testing the respective cost functions on more problems and specifically non-classification problems, where the MSE cost function might converge faster than the CE cost function. Similarly, the testing could be expanded to more iterations and a higher diversity of hyperparameters.


The cross entropy cost function is more efficient than the MSE cost function in classification problems on the MNIST dataset explored in this work. A theoretical analysis reveals that this is due to the fact that the cross entropy cost function avoids the vanishing gradient issue by virtue of its derivative.



Würdigung durch den Experten

Dr. Richard Stotz

Die vorliegende Arbeit untersucht den Zusammenhang zwischen Konzepten der Informationstheorie und maschinellem Lernen durch neuronale Netze. Ein besonderer Fokus der Arbeit liegt in der sorgfältigen mathematischen Herleitung neuronaler Netze. Eine auf dem Konzept der Entropie basierte Kostenfunktion wird sauber definiert, klar informationstheoretisch begründet und theoretisch wie experimentell untersucht. Eine ausführliche Implementierung der vorgestellten Konzepte ergänzt die Arbeit und liefert die Grundlage der experimentellen Auswertungen.






Kollegium St. Michael, Fribourg
Lehrer: Tobias Fuhrer