Mathematik | Informatik

## Brian Funk, 2001 | Fislisbach, AG Silvan Metzker, 2000 | Jonen, AG

The term artificial intelligence (AI) is widely used. Many know this phrase from the context of a science fiction movie or when it comes to data analysis. Google uses AI to provide its customers with optimal advertisements. It works, but hardly anyone can imagine where AI techniques work and where they fail. By programming and testing such intelligence, we try to understand how exactly this can work. How well can an AI perform the tasks given? Where will we face limitations?

#### Introduction

In our project, we used neural networks to create an AI agent that can play games by itself. (I) This was done with the goal of trying to understand where machine learning works and where it fails (II) but also to assess how much an AI could achieve within the scope of our project.

#### Methods

We used a technique common in machine learning called reinforcement learning. In this practice, the agent receives negative or positive rewards, depending on how beneficial the action taken was. When this strategy is combined with a neural network, the agent can generalize the feedback given and develop a strategy. The programme was written in Python and our neural network was created using Google’s library TensorFlow for high performance. Lastly, the games were visualized using pygame, a library that allows the creation of graphical user interfaces and games in Python. This allows a user to see the actions taken by the agent. Additionally, a colour gradient is applied, so that certain tendencies can be visualized. To compare the AI to humans, we gathered statistical data by letting humans play the same games as the AI did.

#### Results

Game-TARS can play three different games – Snake, tic-tac-toe, and Space Invaders. It achieves an average win rate of 92% against a random opponent in tic-tac-toe, whereas it started at 59%, which corresponds to an overall improvement of approximately 55%. The results look even more extreme for Space Invaders. It started at an average score of 8.4×10^3 but ended with 7×10^4, which is an increase of 740%. For Snake, it is a bit more difficult to categorize. The version that plays on the smaller 10×10 grid achieves an average score of 5 apples. This learning behaviour is not impressive, especially when considering that the AI needed 1.25×10^6 games to train. Far more efficient is the Snake agent version, which uses condensed information as input for the AI. Within 5×10^3 games, it reaches an average score of 23.4 on a lager grid. As previously stated, a comparison between the AI and humans was used to gain some additional insights. The test group achieved an average of 12 apples per game. The agents that had to deal with condensed information yielded an average of 20 apples per game. The other agent, which had to deal with the uncondensed state, just reached an average of 5 apples. In tic-tac-toe, the agent did not lose 80% of all games while playing against humans, and 55% of them were ties. Those results are good and show that our AI agent might be better than an average human being.

#### Discussion

Although the learning seems impressive, the goal of a self-learning AI was not fully achieved. The goal was to create an agent that can learn to play different games by itself (I). We were able to create an AI by not altering the inputs, however the performance could have been a lot better. Although we tried to let the AI itself do the condensing with different networks, it never really achieved any remarkable results. Certainly, the biggest mistake was that we concentrated on multiple games. It would have been more efficient to focus on one game and make this AI game-specific and efficient. The second goal of becoming familiar with machine learning mechanics was achieved due to many different approaches, which were tried to improve the network. (II) The better average of the AI that deals with condensed information does not necessarily mean that it is superior to humans. The agent has the advantage of reacting faster to the game than the human opponent.

#### Conclusions

Game-TARS can learn. It does so efficiently when information is provided adequately. However, it learns badly without enforcing the mechanics of the different games during the process of condensing the information. An approach to improve this would be to let the AI learn for a longer period. In the end, this is not a guarantee that the network learns as we would expect it. The key takeaway from this project is that the learning progress of machines is not linear, and unless one can disprove its improvement, one cannot tell if the AI is ever going to be able to adjust adequately to the task provided.

#### Würdigung durch den Experten

Lucas pompe

This project is a nice exploration of what artificial intelligence (AI) techniques can achieve in terms of complex games. The project touches on important limits of modern AI techniques, and correctly recognizes the limits of deep networks when it comes to feature engineering. On top of this, the authors went through the trouble of collecting human data to compare against their AI agent. Finally, this project gave rise to a user interface where the user can interact with the AI agent (in one game) or glimpse what the agent is «thinking» about immediate future choices.

#### Prädikat:

gut

Kantonsschule Wohlen
Lehrer: Patric Rousselot