How to Create a Succesful AWS Deepracer Student League Reward Function

484 Views, 2 Favorites, 0 Comments

How to Create a Succesful AWS Deepracer Student League Reward Function

Welcome to the world of Student AWS DeepRacer! In this guide I'll explain the art of designing a winning reward function that can propel your Deepracer ahead of the competition. Your reward function is the key to mastering the tracks, making decisions, and ultimately winning. Whether you're new to DeepRacer or a enthusiast looking for that competitive edge, this Instructable will equip you with the knowledge and techniques to create a reward function that transforms your Deepracer into a champion.

Supplies

To create a succesful AWS DeepRacer Student League reward function, you'll need:

1. Access and familiarisation of the AWS DeepRacer console (Included in this guide)

2. An AWS DeepRacer account and race

3. Understanding of concept specific topics like machine learning (Included in this guide)

4. Python programming skills

5. Clear objectives for your model (Included in this guide)

Mastering DeepRacer requires a deep understanding of your reward function and deploying it effectively requires a grasp of the concepts of reinforcement learning and python programming.

Familiarise the DeepRacer Console

Here are the general steps to access the AWS Student DeepRacer console:

1. Access the AWS Student Console:

https://student.deepracer.com/signIn

2. Enter Your Credentials and Sign Up or Sign In to Your Account:

Once you're signed in, you'll see something similar to what's shown in the image.

A great navigation tool on the DeepRacer console is the menu bar. This gives quick access to all of the useful pages. Below is a list of some of the useful pages and a brief description of them.

1. Learn - Overview: This page offers 20 hours of informative videos on the fields of machine learning and reinforcement learning. This can be very useful to get you started.

2. Practice - Your Models: This page allows you to view details and information on all of your created models. It also gives you access to create a model.

3. Practice - Model Improvement resources: This page connects you to the DeepRacer community for support.

4. Compete - Student League: This page is allows you to race with all the other student racers around the world. This is a great place to benchmark your models to see how they perform comparatively. This is key to understanding how your model may perform in a race.

5. Compete - Community: This page gives access to your private races. This could be the race you are participating in or a race you setup to practice.

Learn the Workings Behind Deepracer

Understanding the inner workings of AWS DeepRacer is crucial for creating a successful reward function because it enables you as the developer to tune the reward function effectively to guide the reinforcement learning process. Without this knowledge, it's going to be challenging to design a reward function that achieves your goals for the model.

Deepracer's inner workings, including the sensor data, reward function, training algorithm, and training environment, must be well comprehended to craft a reward function that promotes the desired behaviors, and punishes unwanted behaviors - ultimately leading to a well trained and effective Deepracer.

Apart from the tutorials on the AWS Console, here is some information to get you started:

What you need to understand:

Reward Functions
Reinforcement Learning
Sensory Data - Parameters
Training Algorithms
Training Environments

Reward Functions:

What is a reward function?

The reward function is Python code that describes immediate feedback in the form of a reward or penalty to move from a given position on the track to a new position.

What is the purpose of a reward function?

The reward function encourages the vehicle to make moves along the track quickly to reach its destination.

Reinforcement Learning:

Reinforcement learning is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.

Introduction to Reinforcement Learning:

Reinforcement Learning

How is your Deepracer taught?

In AWS DeepRacer, your agent (car) learns from an environment (track) by interacting with it. During training, the model iterates and updates different parameters based on rewards received.

Sensory Data - Deepracer Parameters:

How to write your reward function?

Using a set of predefined parameters of Deepracer (In built into the console), you can detect and learn about your agent's performance on the track. These parameters can be found on this site with plenty of examples of how these parameters have been used to create a reward function.

Site

Training Algorithms:

AWS Deepracer gives you the option to train from 2 different training algorithms. Proximal Policy Optimisation (PPO) and Soft Actor Critic (SAC). The basic difference between these 2 training algorithms is that the PPO uses more data and training time to produce more consistent results, whilst the SAC uses less data and produces less consistent results but also the agent experiments with the reward function more.

Some basic information:

Policy Gradient Methods

The best way to understand these algorithms is to experiment with them.

Training Environment:

The training environment depends on the track, track direction, and training time. Adjusting your training environment can be crucial to creating your perfect Deepracer car. The choice of track and track direction can either make or break your DeepRacer by significantly influencing the agents track capabilities. An ideal training environment you would want your agent to train in should strike a balance between complexity and simplicity, using a variety of tracks that expose the model to various challenges, such as sharp turns and long straights. Additionally, the direction in which the track is raced can significantly impact your agents ability to adapt to different scenarios. Creating an environment that offers variation in track scenario can enhance your Deepracer's training, promoting skill acquisition, while an unbalanced or monotonous environment may hinder the model's ability to navigate effectively in real-world applications. Thus, the selection of the training environment is a critical factor in the success of your DeepRacer model.

Train Your First Model

It's important to start somewhere so you should look towards creating your first reward function and training your first model. To start, you go to the 'models' page on the top left of the screen. If your account is new, you won't have any models displayed. Click 'create' on the top right of the screen. This will guide you through a 6-step process to creating and training your first model. It's important that you read and understand everything that is mentioned in the instructions. This step is pretty self-explanatory and sample reward functions are available (pick one of the sample models to train just to get an understanding of the procedure. The title of these samples directly correlate with the goal you are setting. For example "Follow the center line" has a goal of following the center line). Make sure to read and understand how the code behind the sample works (this is expressed in comments). Every time you create a new model, you will have to go through this process. Just keep in mind that you have 10 hours of training time each month so don't spend too much time training each model. For this first model 20 minutes will be enough of training.

Race

Once your model is trained and submitted to the leaderboard (not raced yet), you will need to follow a racing procedure that races your model, evaluates the race, and gives you your final lap time. The racing procedure goes as follows:

1. Go onto the compete page (student or community depending on where you wish to race)

2. Click on the race you wish to participate in (here you can also view the leaderboard of this race)

3. Click 'enter' in the top right corner

4. Pick your model

5. Wait about 12 minutes for your model to race (the AWS servers take 12 minutes to start up your race and evaluate your race)

This 5-step procedure allows you to race and evaluate results.

Video Analysis

Once you've raced, you can look through the race leaderboard for your agents' racing replay. Analysing this is highly important because this step is what will allow you to improve your models. By analyzing the video footage of your races, you can gain valuable insights into how well your model is achieving your set goal. You could observe how it makes decisions, judge whether it chooses the best racing line, and spot any areas that require development. To fine-tune your model's behavior and incrementally raise its racing efficiency, you must analyse the race video. Additionally, video analysis in DeepRacer serves as a debugging tool. Reviewing the race footage can assist you in identifying the main causes of your model's defects if it is not performing as you expected it to. You can more precisely pinpoint instances where the model faltered by going off course or making bad decisions by using video analysis.

The image above is an example of how I used video analysis to combat issues with my model. After I had submitted the race and AWS servers had evaluated my race time, the only indicator of success given to me was my lap time. Knowing the lap time does not explain errors or areas for improvement, it's only when I went back and watched the agent race, that I was able to pinpoint this error. In this specific example, my agent was travelling at 1m/s (maximum speed for student league) throughout the race except for the turns, in which it slowed down to around 0.6m/s. I was able to conclude that my model needs to be taking faster turns to shave off time.

Write Your Own Reward Function

Now that you understand some key concepts and procedures, it's time to create a meaningful reward function. Start by clearly planning goals for your model. If your race is for lap time your goals could be staying on the track, travelling fast, and completing the track. Then look through the parameters guide to see how these could be implemented. In my example above, I used the parameters "is_offtrack", "speed", and "progress". Next step is the toughest, implementation. Implementation requires you to write smaller chunks of reward function using each of the parameters, then combining them into a full reward function that operates well together. A good idea for writing your first reward function could be to use one of the example reward functions and modifying them to achieve your goals. This can act like a 'template' for you to write a reward function, which is what makes it so much easier for your first custom reward functions. Make sure to balance the weighting of each of the parameters - prioritise and deprioritise. This can be done by increasing or decreasing reward for wanted or unwanted behaviors in different increments. For example, if you really value speed, you could let high speeds increase the reward by a lot more than some of the other parameters.

Recap of the steps:

Define model goals
Select appliable parameters
Implement
Train (30-60mins)
Race

Model Improvement

Now that you have a base reward function and you have seen it's performance. Rather than creating a new model everytime, you can 'clone' your previous model and modify the reward function used. Cloning your model will mean that all information it learnt from the previous training is retained. Using procedures such as in step 5, you should now analyze your models performance and look to pinpoint issues and attempt to resolve them.

Keep Iterating

AWS DeepRacer is a iterative process and so it's important to keep creating new models, keep improving them, and keep racing. You now have all the skills and information to become a DeepRacer champion!

I hope you found this Instructable useful and if you have any questions or feedback please feel free to leave me a comment below. Thank you for taking the time to read through my instructable and good luck for your DeepRacer!!