{"id":763,"date":"2019-06-14T12:11:04","date_gmt":"2019-06-14T10:11:04","guid":{"rendered":"https:\/\/blog.besharp.it\/deepracer-our-journey-to-the-top-ten\/"},"modified":"2021-03-29T17:43:52","modified_gmt":"2021-03-29T15:43:52","slug":"deepracer-our-journey-to-the-top-ten","status":"publish","type":"post","link":"https:\/\/blog.besharp.it\/deepracer-our-journey-to-the-top-ten\/","title":{"rendered":"DeepRacer: our journey to the top ten!"},"content":{"rendered":"

\"\"<\/p>\n

In the last few years, Las Vegas<\/strong>\u00a0has become the reference point for AWS Cloud events. We have seen first-hand re:Invent grow from 6,000 participants in 2012 to over 40,000 last year. An immense event, in which it has become difficult to simply choose the sessions in which to participate! It must also be for this reason that this year, AWS has decided to complement their main event with some conferences with a more specific focus, the first of which,\u00a0the AWS Re:MARS,<\/strong>\u00a0was created around the hottest topics of the moment:\u00a0M<\/strong>achine Learning, A<\/strong>utomation, R<\/strong>obotics and\u00a0S<\/strong>pace.<\/p>\n

beSharp – obviously – could not miss it.<\/strong><\/p>\n

Many big names were present as keynote speakers:\u00a0Jeff Bezos, Werner Vogels,<\/strong>\u00a0Coursera co-founder\u00a0Andrew Ng,<\/strong>\u00a0IRobot CEO and founder\u00a0Colin M. Angle<\/strong>\u00a0and …\u00a0Robert Downey Jr.!<\/strong>\u00a0Who better than “Iron Man” to talk about the technological wonders that will radically change our lives in the coming years? Robert himself is, inter alia, the co-financier of Footprint Coalition, a private organization created with the aim of cleaning up our planet through robotics and cutting-edge technologies.<\/p>\n

Many sessions<\/strong>\u00a0were organized by disruptive companies that presented innovations made possible by artificial intelligence: oil & gas companies, private space companies for the launch of artificial satellites and, above all, the incredible Amazon GO, the chain of Amazon stores where it is possible to do shopping and checkout without going through the cash registers. As the motto says,\u00a0“no lines, no checkout. NO seriously! “<\/strong>: Thanks to machine learning techniques and simulations in 3D environments, anyone who enters a store is labeled at the entrance, so as to keep track of the actions and items taken from the shelves: upon exiting the store, the system of\u00a0Amazon GO<\/strong>\u00a0processes the “cart” and sends the invoice directly to the user’s personal Amazon profile. An incredible experience!<\/p>\n

While the official sessions only started on June 5th, right from the first day it was possible to participate in\u00a0workshops<\/strong>\u00a0on some specific topics; we immediately identified one that particularly excited our nerd fantasies:\u00a0a deep-dive on AWS DeepRacer!<\/strong><\/p>\n

The workshop really impressed us: introduced by the keynote speaker of re:Invent 2018 by Andy Jassy, this\u00a04WD model<\/strong>\u00a0with monster truck axle is able to learn how to move autonomously on predetermined paths through\u00a0Reinforcement Learning.<\/strong>\u00a0Described by AWS as the easiest way to learn Machine Learning, AWS DeepRacer keeps all it promises. The series of steps to get on track and watch your car run is truly minimal. It is possible to have\u00a0a model trained for driving in just under an hour,<\/strong>\u00a0although, obviously, more experiments and much more time are needed to get good results.<\/p>\n

We immediately experimented with as many options as possible to improve our time on the track from time to time. Among other things, the re:MARS is one of the stops of\u00a0the DeepRacer League,<\/strong>\u00a0a competition that takes place in conjunction with the main AWS events.<\/p>\n

What better opportunity to learn directly in the field?<\/p>\n

How AWS DeepRacer and Reinforcement Learning work<\/strong><\/h2>\n

Before starting to talk about racing and record time, it is good to take a look at the interface of the AWS DeepRacer service,<\/strong>\u00a0which is the model training tool. It seems silly to specify it, but it is essential<\/p>\n

to have an AWS account!<\/p>\n

As soon as you enter your console, click on the services bar and search for “DeepRacer”<\/p>\n

\"\"<\/p>\n

From the home screen, you can see our models, check the status of the training and create new ones.<\/p>\n

\"\"<\/p>\n

To begin, let\u2019s create a new model by clicking on\u00a0“Create model”.<\/strong><\/p>\n

This screen presents the features of the model, as well as checking if we have all the permissions on the account to be able to save it correctly.<\/p>\n

\"\"<\/p>\n

In case there is anything to fix, AWS will notify you and help you correct it.<\/p>\n

We enter a name and a description:<\/strong>\u00a0choose a name that is easy to remember and above all unique because, if you want to compete in an official race, you will be asked to transfer your model to the scale race car through a USB key, and then to identify it from those loaded through an app from the track marshal iPad.<\/p>\n

We choose a track to drive the model:<\/strong>\u00a0we have selected the first, which is the official circuit for the DeepRacer League, “re:Invent 2018”. You can try any available track.<\/p>\n

\"\"<\/p>\n

Once the training track has been selected,\u00a0it is time to create the reward function with which<\/strong>\u00a0we will train the model. This step is essential to obtain a performing car and get good scores in the races.<\/p>\n

Before telling you about our experience, it is useful to briefly reiterate how\u00a0Reinforcement Learning works.<\/strong><\/p>\n

Reinforcement Learning\u00a0is a training system of unsupervised neural networks,<\/strong>\u00a0neural networks that do not need an initial ground truth with which to adapt their own weights. Indeed, Reinforcement Learning performs different measurements of the surrounding environment to maximize its reward function. During this process, which is repeated indefinitely until a cutoff threshold is reached, the weights of the network are updated each time, thus optimizing the network itself.<\/p>\n

In the case of the DeepRacer Car, we started with a very simple reward function, whose goal is to teach the car to stay in the middle of the track; this means returning a higher reward value if, at the time of measurement, the distance from the center of the roadway is less than half the width of the road. In all other cases, the reward is reduced.<\/p>\n

Below is an example of how to construct the function:<\/p>\n

import<\/span><\/b> math<\/span><\/b>\r\n\r\ndef<\/span><\/b> reward_function<\/span><\/b>(<\/span>params<\/span>):<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>'''<\/span>\r\n\r\n\u00a0 \u00a0Use square root for center line<\/span>\r\n\r\n \u00a0\u00a0\u00a0'''<\/span>\r\n\r\n \u00a0\u00a0\u00a0track_width <\/span>=<\/span> params<\/span>[<\/span>'track_width'<\/span>]<\/span>\r\n\r\n \u00a0\u00a0\u00a0distance_from_center <\/span>=<\/span> params<\/span>[<\/span>'distance_from_center'<\/span>]<\/span>\r\n\r\n \u00a0\u00a0\u00a0reward <\/span>=<\/span> 1<\/span><\/b> -<\/span> math<\/span>.<\/span>sqrt<\/span>(<\/span>distance_from_center <\/span>\/<\/span> (<\/span>track_width<\/span>\/<\/span>2<\/span><\/b>))<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>if<\/span><\/b> reward <\/span><<\/span> 0<\/span><\/b>:<\/span>\r\n\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0reward <\/span>=<\/span> 0<\/span><\/b>\r\n\r\n \u00a0\u00a0\u00a0<\/span>return<\/span><\/b> float<\/span>(<\/span>reward<\/span>)<\/span><\/pre>\n

We choose the degrees of freedom of our 4WD:\u00a0maximum speed, steering angle and possible speed levels.<\/strong> \u00a0The linear combination of this information defines how many variations the car is able to manage both in the case of steering and speed changes.<\/p>\n

\"\"<\/p>\n

This operation is strongly dependent on the training function and vice versa: often, alterations in the degrees of freedom on the reward function produce very different results between them.<\/p>\n

\"\"<\/p>\n

Enter this information; you can decide how many hours to train the model, up to a maximum of 8 hours per single operation.<\/p>\n

It is useful to know that it\u00a0is possible to further re-train the same model<\/strong>\u00a0to increase the degree of confidence. What we have verified is that, with a training time of around 8 – 10 hours, it is possible to give the car a certain confidence on the track, provided you keep a simple model.<\/p>\n

We perform some confidence tests on the function described above: from the main screen of the model, we click on\u00a0“Start new evaluation”<\/strong> and choose the number of “trials” on the track; with three tests, the results are the following:<\/p>\n

\"\"<\/p>\n

Not bad as a first result, but we certainly could not stop at 23 seconds! Therefore, here are the different variables that DeepRacer provides to manipulate its reward function<\/p>\n

{<\/span>\r\n\r\n \u00a0\u00a0<\/span>\u00a0<\/span>\"all_wheels_on_track\"<\/span>:<\/span> Boolean<\/span>,<\/span> \u00a0\u00a0\u00a0<\/span># flag to indicate if the vehicle is on the track<\/span>\r\n\r\n \u00a0<\/span>\u00a0\u00a0<\/span>\"x\"<\/span>:<\/span> float<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span># vehicle's x-coordinate in meters<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>\"y\"<\/span>:<\/span> float<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span># vehicle's y-coordinate in meters<\/span>\r\n\r\n \u00a0\u00a0<\/span>\u00a0<\/span>\"distance_from_center\"<\/span>:<\/span> float<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0<\/span># distance in m<\/span>eters from the track center <\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>\"is_left_of_center\"<\/span>:<\/span> Boolean<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0<\/span># Flag to indicate if the vehicle is on the left side to the track <\/span>center or not. <\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>\"heading\"<\/span>:<\/span> float<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>\u00a0<\/span># vehicle's yaw in degrees<\/span>\r\n\r\n \u00a0\u00a0<\/span>\u00a0<\/span>\"progress\"<\/span>:<\/span> float<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span># percentage of track completed<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>\"steps\"<\/span>:<\/span> int<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span># number steps completed<\/span>\r\n\r\n \u00a0\u00a0<\/span>\u00a0<\/span>\"speed\"<\/span>:<\/span> float<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span># vehicle's speed in meters per second (m\/s)<\/span>\r\n\r\n \u00a0\u00a0<\/span>\u00a0<\/span>\"steering_angle\"<\/span>:<\/span> float<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span># vehicle's steering angle in degrees<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>\"track_width\"<\/span>:<\/span> float<\/span>,<\/span> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span># width of the track<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>\"waypoints\"<\/span>:<\/span> [[<\/span>float<\/span>,<\/span> float<\/span>],<\/span> \u2026<\/span> ],<\/span> # <\/span>list of [x,y] as milestones along the track center<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>\"closest_waypoints\"<\/span>:<\/span> [<\/span>int<\/span>,<\/span> int<\/span>]<\/span> \u00a0\u00a0\u00a0<\/span># <\/span>indices of the two nearest waypoints.<\/span>\r\n\r\n}<\/span><\/pre>\n

Let\u2019s try to add some of this information to our reward function:<\/p>\n

import<\/span> math<\/span>\r\n\r\ndef<\/span> reward_function<\/span>(<\/span>params<\/span>):<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>'''<\/span>\r\n\r\n \u00a0\u00a0\u00a0Use square root for center line<\/span>\r\n\r\n \u00a0\u00a0\u00a0'''<\/span>\r\n\r\n \u00a0\u00a0\u00a0track_width <\/span>=<\/span> params<\/span>[<\/span>'track_width'<\/span>]<\/span>\r\n\r\n \u00a0\u00a0\u00a0distance_from_center <\/span>=<\/span> params<\/span>[<\/span>'distance_from_center'<\/span>]<\/span>\r\n\r\n \u00a0\u00a0\u00a0steering <\/span>=<\/span> abs<\/span>(<\/span>params<\/span>[<\/span>'steering_angle'<\/span>])<\/span>\r\n\r\n \u00a0\u00a0\u00a0speed <\/span>=<\/span> params<\/span>[<\/span>'speed'<\/span>]<\/span>\r\n\r\n \u00a0\u00a0\u00a0all_wheels_on_track <\/span>=<\/span> params<\/span>[<\/span>'all_wheels_on_track'<\/span>]<\/span>\r\n\r\n \u00a0\u00a0\u00a0ABS_STEERING_THRESHOLD <\/span>=<\/span> 15<\/span>\r\n\r\n\r\n\r\n\r\n \u00a0\u00a0\u00a0reward <\/span>=<\/span> 1<\/span> -<\/span> (<\/span>distance_from_center <\/span>\/<\/span> (<\/span>track_width<\/span>\/<\/span>2<\/span>))**(<\/span>4<\/span>)<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>if<\/span> reward <\/span><<\/span> 0<\/span>:<\/span>\r\n\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0reward <\/span>=<\/span> 0<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>if<\/span> steering <\/span>><\/span> ABS_STEERING_THRESHOLD<\/span>:<\/span>\r\n\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0reward <\/span>*=<\/span> 0.8<\/span>\r\n\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>\r\n\r\n \u00a0\u00a0<\/span>\u00a0<\/span>if<\/span> not<\/span> (<\/span>all_wheels_on_track<\/span>):<\/span>\r\n\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0reward <\/span>=<\/span> 0<\/span>\r\n\r\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span>\r\n\r\n \u00a0\u00a0\u00a0<\/span>return<\/span> float<\/span>(<\/span>reward<\/span>)\r\n<\/span><\/pre>\n

In particular, we added the “steering angle”, the “speed” and the Boolean variable “all_wheels_on_track”, which shows us if at a given moment the car has all the wheels off the track.<\/p>\n

If we look at the code, we see that the reward function, after being calculated with respect to the position relative to the center of the track, is modified as follows:<\/p>\n