Mathematik  |  Informatik


Levin Ceglie, 2003 | St.Gallen, SG


We explore the fundamentals of Reinforcement Learning (RL) with the final goal of building learning agents that can reach superhuman performance at the game of snake. We start by applying the tabular Q-Learning algorithm on a simple gridworld environment to explore the effects and importance of parameter optimization. Then, we use tabular Q-learning to solve the Cartpole environment, provided by OpenAI Gym, with the insights gained from the first experiment. Finally, we set out to create an agent to outperform humans in the game of snake. To cope with the high dimensionality, we utilize neural networks to approximate the game Q-function, according to the Deep Q-Learning algorithm. Moreover, towards our goal: 1) we design and study the effectiveness of eight possible state representations, demonstrating their importance, and 2) we utilize a form of curriculum learning where agents first learn to play smaller and easier instances of the game. The resulting agents trained according to our approach significantly outperform a human test group.


I sought to answer the following two hypothesis. (I) One of the most basic solution methods of Reinforcement Learning, namely tabular Q-Learning, is adequate to solve the Cartpole environment, provided by OpenAI Gym. (II) Using Deep Reinforcement Learning, an agent can learn to play the game of Snake on a human-level performance.


I used the well known programming language Python in combination with PyTorch, a commonly used machine learning framework to implement the discussed algorithms. I also used Python to create the Gridworld and Snake environment.


I managed to confirm both of my hypothesis. That is to say that I firstly managed to solve the Cartpole environment provided by OpenAI Gym by using a tabular version of the Q-Learning algorithm. Secondly I managed to create an agent that outperformed a human test group by a significant amount. In doing so, we also investigated the importance of choosing an «appropriate» state representation. Furthermore, we applied curriculum learning and showed how effective it can be.


Even though we managed to create a well-performing snake agent, we did not manage to create one that was able to solve snake, i.e. to fill the whole grid. It would be interesting to see how different Deep Reinforcement Learning algorithms perfomed and compare them to the Deep Q-Learning algorithm. Maybe with those one would be able to solve snake.


I explored the fundamentals of Reinforcement Learning and implemented Q-Learning based algorithms that can solve well-known environments and reach superhuman performance at the game of snake. Thus, I was able to confirm both of the initially posed hypothesis. I find it quite remarkable how much can be achieved by only scratching the surface of all the things that Reinforcement Learning has to offer and I am looking forward to continue exploring. Finally, I am grateful for the journey this work took me on.



Würdigung durch den Experten

Dr. Pier Giuseppe Sessa

This project explores the fundamentals of Machine Learning (ML) based on Reinforcement Learning (RL) algorithms, with the goal of building RL agents that can reach superhuman performance at the well known «Snake» video game. It initially demonstrates the abilities and limitations of existing RL algorithms on known environments, such as a Grid-World and Cartpole. Subsequently it proposes a custom tailored RL approach to solve the «Snake» game. In particular, suitable state representations and a curriculum learning strategy are shown to be crucial for the task. The obtained agents demonstrate good game performance outperforming a human control group by far.


sehr gut




Kantonsschule am Burggraben , St. Gallen
Lehrer: Urs Sieber