Reinforcement learning is traditionally used to let an agent learn how to solve a task by interacting with an environment, starting with zero knowledge, in a “tabula rasa” setting. Hence, the agent needs to interact with the environment for many episodes of the simulation before it becomes proficient in solving the task: driving fast and without going out of the track in the case of the Car Racing environment. While the reinforcement learning approach is very interesting, it is also worth considering the option of learning in a more efficient way, closer to human cognition, letting the agent learn to perform a task by observing an expert. The latter approach is used in the current experiment, applying the Synthetic Cognition algorithm.
The experiment can be described as a supervised learning regression problem, where the predictor information comprises the pixels of the image of the simulation.
Minimal preprocessing was applied to the images, namely dimensionality reduction (from 96×96 to 48×48 pixels) and flattening to 1D, background subtraction (green surroundings of tracks) conversion from color to grayscale and intensity binarization to either 0 or 1 based on a threshold. The target variables to predict are the controls of the car (steering direction, acceleration intensity and brake intensity). The problem was solved in both offline and online settings, described in the following paragraphs.
In the offline version of the experiment, the Synthetic Cognition model was trained with 2000 images of consecutive frames of human play, along with the floating point values describing the control actions applied to the car in every frame. The images comprise approximately one lap to a circuit. The model was then tested in different circuits of the same environment, only receiving the images as input, so it had to predict the car controls, which were then applied to the world to update the simulation. The agent was able to drive successfully without going out of the track in the majority of the executions.
In the online setting, a human player can control the car with the keyboard, hence letting the model learn both the image and used controls. The player can switch the control at any time to the algorithm by pressing a special key, and then the model perceives the image and has to infer the actions. Once Synthetic Cognition has seen enough samples of proper driving experience (e.g. close to 2000, as in the online setting), it is able to drive successfully. If the model is given the active control earlier, it can make mistakes, and the human can again take the control to teach how to go back to the track; since it is learning online (continuously), the algorithm will learn from correcting past mistakes.