How to Understand and Code a Winning Student AWS DeepRacer Reward Function
by shing85565 in Living > Education
705 Views, 0 Favorites, 0 Comments
How to Understand and Code a Winning Student AWS DeepRacer Reward Function
Are you ready to dive straight into creating a winning AWS DeepRacer reward function? This instructable is all about providing you with the knowledge and strategies to code a reward function that stands out in the AWS DeepRacer League.
Through this guide, I will walk you through the steps and essential tips in developing a reward function. I've refined the information down to what really matters, based on firsthand experience and extensive research.
Get ready to learn, understand, and develop a reward function that will take you to the top of the leaderboard. Let's get started!
Supplies
There aren't many supplies required, but here are some things that will be useful to know and some things you should have:
Materials & Resources
- Laptop and WiFi
- An AWS DeepRacer Account (We will go over the Sign-Up Page)
Skills & Knowledge
- Python Skills (The very basics will suffice)
- Knowledge of Machine Learning / Artificial Intelligence (Not specifics but about what they are)
Make an Account
Head to this link and press sign up at the bottom - enter your email and create a password. Once you have signed in, you will have to add some information about yourself.
Fill in the following information:
- First and Last name
- The school you go to
- Your current or prospective major
- Your planned year of graduation
- Your country of residency
I would also head to your profile and change your racing name (and avatar if you want).
Once you have signed-up, head to this link.
Understanding the Console
Once you have made an account and filled in the required information, you should be seeing a screen similar to the one shown above.
- The first picture is the home page, where you can see some key information such as the number of training hours you have remaining. Training hours are the number hours you have to train your models. When you train your models, the time you use will be deducted from the 10 training hours you receive (which will reset at the beginning of every month). This will make more sense later when we get to training the models.
- The next picture shows the learning section, where you can learn Machine Learning and Reinforcement Learning, the two key concepts used in AWS DeepRacer. The two sections have a good amount of resources and I recommend doing these lessons if you aren't sure about what these concepts are or if you are looking to reinforce your understanding.
- The third picture shows all of the models you have. This is where you will spend most of your time and what we will be covering in the next section.
Understanding Model Creation
When creating a model, there are two options for what you can do. One is to create a model from scratch. In DeepRacer, a car/model learns from a track by driving on it. During the training process, the model runs again and again and updates the path it takes based on rewards received. The rewards the model receives are based on the code you write. Training a model takes time. Each model iteration is a step towards finding the optimal parameters to use in the code. When training has finished, we can use the second option to build on what has been done. The second option is to clone an existing model to continue improving it. Cloning is different from creation as cloning is changing the code of a model, but it still retains the data from previous runs.
There are 5 steps in creating a model to train.
- Naming your model.
- This step is up to you, however I have defaulted to the naming convention: Model-Attempt-1-0
- I chose this naming convention as it is simple and easy to understand. The first number shows the model number. If I create a fresh/new model, I will increase the number. But if I simply clone Model-Attempt-1-0, I will rename the cloned model to Model-Attempt-1-1.
- Choosing the track.
- This is quite straightforward as well, as the track you want your model to be most accustomed to will most definitely be the track you train it on.
- I suggest you start with an easy track as you will be able to learn the mechanics of how the reward function works, and then working your way up to a harder track, as it may require a more complex reward function.
- One minor detail at the end of track selection is the track direction (clockwise or counterclockwise). Once more, this is up to you on which way you want your model to train.
- Choosing the Algorithm Type.
- There are two algorithm types: PPO (Proximal Policy Optimisation) and SAC (Soft Actor Critic). These algorithms discover total reward differently. PPO explores the environment less compared to SAC. PPO requires more data, but produces more consistent results, whereas SAC uses less data, but produces less consistent results. I would recommend PPO as more consistent results will help your model be faster in the long run.
We will talk about the next two steps in the next sections.
Coding the Reward Function
The reward function is at the core of machine learning. We use it to incentivise your model to take specific actions as it explores the track. Just as you would encourage and discourage certain behaviours in a pet, you can use this tool to encourage your car to finish a lap as fast as possible and discourage it from driving off of the track and zig-zagging.
When training your first model, you may want to use a default sample reward function. When you're ready to experiment and optimise your model, you can customise the reward function by editing the code in the code editor. As you can see from the photo above, there are three default codes provided to you. The basic code provided is for following the centreline of the track, staying within the borders, or preventing zig-zagging while running. I suggest you carefully examine the code and read the comments to get a better understanding of how the reward function works. There is a button that says Walk me through this code, which you could also press to further your knowledge.
This page linked here provides you with a list of all the different parameters you can use as well as explains how and why they work. Once you feel you have a grasp on what is going on, edit the sample reward function code by selecting a sample reward function and choosing Edit sample code. Edit the code and select Validate to check if your code works. If your code can't be validated or you would like to reset the code to its original state, press reset or try to fix the error.
As a starting point, I recommend trying to combine the three sample codes to make one code that achieves all three goals.
Choosing a Duration and Training
The duration of your model's training impacts its performance. When experimenting in the early phase of training, you should try different durations to see what you think works best. Remember that under-training your model will result in unawareness of the track, but overtraining it will only result in running out of time to train your other models (considering you only get 10 hours a month). In this step of training your model, your trained model is submitted to a leaderboard. You can opt out by deselecting the checkbox that says I accept my model will be submitted to the leaderboard.
On the same page, there is a text box that says Model Description. I highly recommend writing a description of the code and duration that you are training the model. I know it may seem tedious, however it becomes very useful when it comes to creating new models and understanding what code and duration works best for you.
When you're ready, press Train your model and in the Initialising model training pop-up, choose Okay. On the Training configuration page, you can review your model's training status and configuration. You can also view a video of your model training on the selected track when the training Status is In progress. Watching the video can help you develop valuable insights that you can use to improve your model.
Racing Your Model
Once your model has completed its training, you need to enter a race/competition that races your model, simulates your car on the track, and returns your final lap time(s). The procedure is as follows:
1. Click onto the compete page from the menu - Either Student League or Community Races, depending on which one you want to enter your model into.
2. Click on the race you wish to participate in - Click on View Leaderboard or Enter Race
3. Click Enter Race in the top right corner
4. Select the model you wish to enter and press Enter Race on the bottom right (This process might take some time depending on the length of the track or the speed of your model)
Soon, you'll be able to see where your model finished relative to others, and the video of it racing on the track.
Next Steps
Here are the next steps that you should continue to repeat:
Analyse
After you have trained your model and submitted it to a leaderboard, you can view its performance not just by checking the time, but by watching the video provided to you. In this video, you are able to see your agent iterating around the track and see where it's slow, where it's taking unnecessary movements, and where it is going off the track. By analysing these points of low performance, you can determine what you need to fix in your code in order to negate these issues and make the fastest possible model possible.
Seeing the time provides no data or where you can improve, it's only when you see the video when it can be determined where the agent isn't taking the racing line, or isn't going at the max speed, or is taking erratic turns, etc. For example by looking at the video, I found out that the max speed that the car could go is around 1 m/s, and I noticed that even on the straights, it wasn't maxing out the speed. So, I incentivised the model to go faster by increasing the reward when it is achieving the speed threshold I want it to be in.
Improve and Execute
After you have trained and analysed your initial model, you can clone it to improve it. Cloning your model saves you steps and makes training more efficient by using a previously trained model as the starting point for a new model as it has the data from the prior runs. Now, you should be setting goals for what you want your model to achieve so you know what you need to change in terms of the code. For example, add some new parameters or change the values of the rewards. Make sure to be careful when adding each of the parameters; prioritise some over others where the model is struggling. You can do this by increasing or decreasing rewards in different increments to get positive change.
This improve stage is composed of a few main steps:
- Clone your previous/best model
- Change, add, and remove parameters
- Change the duration based on how many changes were made
- Train your model
Now, it's time to enter your new model back into the race you are competing in, and see what happens to the time. Once you get the new time, whether it's faster or slower, repeat these two steps over and over, and eventually, you'll get the results you're working towards.
Final Words
AWS DeepRacer is a long and hard process, and you may easily get frustrated. What's most important to remember is to not get discouraged, sometimes it just takes a small tweak, or a change in the training hours, but achieving anything takes hard work. The model may not always become faster, often it might even be slower, but you just have to keep trying. By working through the steps provided above, you have the best chance of creating a solid model capable of winning races. You may not start winning soon, but experience is one of the most important things to have when it comes to machine learning. Eventually, you will understand the components and be able to change and fix problems in the blink of an eye.
I hope you found this instructable useful and I wish you the best of luck on your AWS DeepRacer journey. Thanks for reading!