Introduction to Reinforcement Learning through a canonical example.
The three fields of Machine Learning
The last one, but less known is Reinforcement Learning (RL). RL is about finding the best sequence of actions to perform in an uncertain environment to maximize a sum of reward. We will see what this means.
Learning to act
Here the canonical frozen lake example. Imagine that a robot is lying on a frozen lake. Each time he enters into a cell, he earns the reward inside indicated in the figure (-1 or 100). He starts to act at the left bottom corner. The episode ends when he enters into a green or red cell. Those rewards can be positive, or negative (punishment).
When he goes toward a wall, or a black cell he bumps back to the same cell. He accumulates as well the reward from the same cell. f the robots enters several times a cell, each time he gets the reward. The goal for our robot is to maximize the sum of rewards along one episode.
A tough problem under uncertainty
Here, as we defined the problem, maximizing the sum of rewards along one episode is equivalent to meet as quick as possible the green cell without going into the red cell. Nevertheless, our robot has to face the fact that the lake is frozen.
Because of the ice, when the robots chooses to go toward one direction, say North, he has 80% to effectively follow the direction he chose. But 10% of the times he goes to left of that direction and 10% of the times to right. The second difficulty is that the robot ignores everything about the lake and cells. He discovers the environment at the same moment that he learns to maximize rewards.
We can define two (related) criteria about robot’s learning. First we would the after a great number of examples, he tends to act as best as possible. Second, we can want that the robot do not learn to slow.
In next article, we will see the specific vocabulary of Reinforcement Learning.