NeuroEvolution on 2d Racing Lines

The development of autonomous vehicles is an exciting technology that has really taken off in recent years. As a F1 fan and software developer I was especially inspired by this technology and wanted to see if I could introduce some principles of autosports into a basic self-driving model. The goal of this project was to create an autonomous driving model capable of finding the optimal racing line using neural networks optimized through genetic algorithms.

The basic idea is to create a simple 2D car racing game in an environment where that can measure things like laptimes, car physics, car control, collisions, and other information that can be used to build a model to drive the car. Some of this data is then fed into a neural net in realtime and the output of the net is used to control the car. When the car collides with the wall the lap is considered over and that network is given a score based on a fitness function.

The fitness function attempts to best capture what defines a successful lap. Once the network is scored a new one is generated and the car is reset. This process is repeated where a new neural net (genome) is generated each time until the declared population size is reached. Once the last network of the population has run then we're left with a bunch of scored networks which is considered a generation. The best networks from the generation are kept and used to seed the new networks for the next generation.

Building the Environment

I wanted to keep the environment that the car exists in really simple so I could focus on a control model. I decided to go with phaser a JavaScript-based game engine and build everything in the browser.

The Track

Phaser was really simple to get started with and I was able to quickly build a simple track object using physicsEditor and an image I found online. The track boundaries for car collisions are defined as a single complex polygon created with physicsEditor. I also needed to define non-collidable lines that would act as distance markers so that I would have a measure of how far the car had progressed.

The Car

Before getting into the details of the car implementation I want clarify the objective. We want to see the car learn to navigate the track and over time get quicker by following the racing line. The racing line being defined as the path that a driver follows through a corner in order maximize speed and minimize lap times. This track is simple so the racing line will often be the largest radius through the corner.
It becomes a little more complicated than this because the path through multiple corners needs to be considered but for the most part this is a good way to conceptualize the idea.

The car is setup with simple dynamics at the moment to decrease the complexity of training the model. The key aspect about the car dynamics is that the car will travel in a straight line at a constant velocity but can only maintain a fraction of that velocity through corners. A better physics model would look at the centripetal forces through the corner and determine slip-angles but the main idea of being able to go faster in a straight line is still captured in the current naive model.

Alright, so now we have an idea of how our car should travel through the track we need provide the car with information about its surroundings. For this we use the idea of proximity sensors implemented by casting a ray and looking for the nearest line intersection with the track boundaries. By positioning three proximity sensors at the front of the car to measure distances from the sensor to a wall the car is provided with information regarding upcoming corners. This data is what is passed into the neural net to compute a steering decision.

With this in place all that's needed is to log lap times and the distance travelled and the car can be tested out using the keyboard.

Neural Networks and Genetic Algorithms

So now's where we get into the cool stuff. So far I explained the basics of how the neural nets are used with genetic algorithms but I wanted to get a little more detailed regarding the specifics of the implementation.

Neural Net

The neural nets are built using the Synaptic JavaScript library which is an awesome library for building simple ANN-based applications in the browser. My implementation uses a fixed topology feedforward perceptron network that has four layers: an input layer, two hidden layers, and the output layer. The input layer consisting of three nodes, one node for each sensor input, both hidden layers contain four nodes each, and the output layer has a single node that outputs a real number in the range [0, 1]. The output value is split into thirds where a value between 0.667-1 corresponds to a right turn, a value between between 0-0.332 a left turn, and a value between 0.332-0.667 will tell the car to continue driving forwards.

The genetic algorithm implemented is rather simple and will adjust the weights between neurons and the bias values but not the actual topology of the network. More complex examples of NeuroEvolution such as the NEAT algorithm will actually change the overall network topology during evolution. This project is built to allow different "drivers" to control the car so at some point I might take a crack at implementing a driver using the NEAT algorithm.

Genetic Algorithms

This genetic algorithms used were inspired by this really cool project posted on Hackernews that uses NeuroEvolution to play the Google Chrome dinosaur game.

I've already somewhat covered the basics of how the genetic algorithm is used in combination with neural nets but I glossed over an important aspect of scoring the networks, the fitness function.

Fitness Function

The fitness function needs to represent what we consider to be a "good" lap. This means that we want to take networks that minimize the lap time and avoid networks that go out of bounds. This is done by assigning a score to the network based on the milliseconds it took to either crash or complete a lap and then add a 2 second penalty for every distance marker short of the finish line:

var fitness = laptime + (nMarkers - distanceTravelled)*1000;

This way we can sort all networks on their assigned score and be sure that the networks with the smallest scores are the best.

Note that this only works to capture the driving line because the car dynamics were setup to move slower while turning. If the time had been taken to develop proper car physics the driving line would also emerge. However if the environment had been setup where the car travels the same speed regardless of whether it's turning the emergent behaviour would likely be to hug the inside of the track and cover the least distance.

Mutation and Crossover

This is the meat and potatoes of the genetic algorithm. Once all networks have been assigned a score through the fitness function it's time to build the next generation of networks. The idea is to move the best networks forward and use these networks to seed the next generation. Ideally we want to draw from networks that have done well while also introducing random mutations to avoid converging on a local minimum of the error space.
Without introducing mutations the optimization could get too focused on something that produces good results but does not actually capture the problem we're trying to solve. Additionally all initial networks are generated with random values so without random mutations there would be would be no way to improve the networks.

This is how the idea of evolution is used to optimize neural networks. Over time a network will see a random mutation that causes it to thrive over its peers, this leads to the network advancing and the good mutation propagating forward throughout generations.

Mutation is implemented by introducing a random change in a network's bias values and weights in order to slightly modify the network to create a new one. This is combined with crossover, a method to take two networks and generate a third by combining values from each network. The specific crossover implemented is called single-point crossover, where a single slice point from both parent networks is chosen and all data past this point is swapped creating two new children networks (we're only carrying forward a single child network).

By combining these two techniques it is possible to completely build a new generation of networks that's based on the best networks from the previous generation, all while introducing mutations that will allow for further improvements to be found.

Putting this all together yields a model that is able to start with no training and with no prior data learn how to complete a task. The model is trained simply by letting it run and attempt to navigate the track for awhile. It's a lot of fun to watch because at the start the car basically just drives into a wall but eventually you can see it start to make good decisions and over time learn to navigate the track. Awhile after that and it begins to learn how to get quicker and in the end the model is without a doubt better than the human player. Here are examples shown at different generations of progression. Note that the driving model is being loaded with data from the specified generation so the generation displayed on the screen is wrong

Generation 5 - unable to complete a lap (or really do anything)

Generation 20 - learned the first corner but still unable to complete a lap

Generation 35 - first generation to complete a full lap and with a time of 8.151s

Finally Generation 100 - able to complete a full lap with respect to the racing line and with a best time of 6.317s

To access the full code for this project head on over to my profile on Github