Introduction to Reinforcement Learning (RL)

Reinforcement Learning has become interesting to many because you don’t need a huge data set to get started; the procedures inherently create this data.

Recently, FRONTLINE PBS published In the Age of AI, a documentary exploring the impact of artificial intelligence (A.I.) on our daily lives. Alongside doom scenarios of people losing their jobs to automation, it also demonstrates the power of artificial intelligence and deep learning. In this blog, we’ll share why these technologies are so powerful, and you’ll find out ways to get started with a very popular topic in A.I.: Reinforcement Learning.

The Age Of AI – Frontline PBS

Human vs. artificial brain: 0-1?

Although A.I. is a catalyst in all kinds of automation, it does certainly not always outperform the human brain. Yet, our human capability to make predictions through associations is inherently limited by our brain capacity. This is where machine learning steps in to augment (or outperform?) us, humans.

The PBS documentary shows how deep learning has become capable of recognising early-stage breast cancer from mammograms more accurately than experienced doctors. The machine has learned to associate input parameters in a way that surpass human capability.

A beautiful illustration of this phenomenon is Google’s AlphaGo, a computer program that defeated the 18-time world champion Lee Sedol in the Go board game. The computer made a brilliant move that was unseen in the thousands of years of playing Go.

The value of data

Since the Edward Snowden leaks, we know that Western intelligence agencies gather tons of data sent and received through and from our electronic devices. So-called “metadata” indicating where a call took place, when an e-mail was sent etc. can be of great value, even when the actual conversation is not known. Data analysis and A.I. allow for behavioural prediction, for example indicating that someone could be a threat to national security.

China, too, looks with great interest at artificial intelligence. Becoming the world leader in A.I. by 2030 is one of the targets of the Chinese government. Facial recognition, for example, is already used for automated grocery payments, and also enables the government to identify and penalise offenders in China’s Social Credit System.

China is is on track to become the world’s first major digital surveillance state. When it comes to building a system of mass surveillance, it’s imperative to gather all the data you possibly can. As Sinovation Ventures CEO Kai-Fu Lee once said: “Data is the new oil, and China is the new Saudi Arabia“.

There, we said it: data. The power of the bulk of A.I. techniques depends on good data. The bigger your data set, and the more representative the data is for the problem at hand, the more accurate your predictions will be.

Introduction to Reinforcement Learning (RL)

What progress in Artificial Intelligence has taught us most, is that Machine Learning requires data, and loads of it. Data has become more valuable than the developers creating the tools needed to work with the data. This is why Reinforcement Learnings has become so interesting to many because you don’t need a huge data set to get started; the procedures inherently create this data.


Reinforcement learning is often compared to the human learning process. Take for example a child learning how to ride a bicycle. When the child leans to the left or the right while turning the steering wheel in the other direction, this might result in a somewhat unpleasant encounter between head and road.
Thereafter, the child notices that stopping without putting a foot to the ground has the same unwanted result. Mistake after mistake teaches the child that certain actions have certain consequences. In this way, the child learns to keep its balance, and thus ride a bicycle..

In terms of reinforcement learning, we could consider the child as the agent trying to master a task, namely riding a bicycle. The agent can perform certain actions: steering left/right, pedaling, breaking and so on. Executing an action brings the agent in a new state, for example arriving at the destination or falling off the bike. Based on the outcome of the action, the agent receives feedback on the consequences of its doings.

UNLIMITED DATA WITH Reinforcement Learning

A real-world example of the application of reinforcement learning are autonomously parking vehicles.

The car is the agent, which perceives its environment through all kinds of sensors and cameras. It performs an action (moving forward/backward, turning left/right, breaking) and receives feedback on its performance, based on (among others) closeness to the actual parking spot and how many objects it hits along the way.
By giving the agent the opportunity to try over and over again, it will eventually become good in the task at hand.

The latter is what distinguishes reinforcement learning from supervised learning and unsupervised learning techniques. Supervised and unsupervised learning require tons of data in order to build a sufficiently robust model. Since reinforcement learning gathers its data through trial & error, such a limitation does not exist.


One of the biggest challenges with reinforcement learning is to engineer a good assessment function. The function should reward the agent sufficiently so that good behaviour is encouraged and penalised upon bad behaviour. The focus of the rewarding policy should not be too narrow, in order to avoid unexpected behaviour. A good illustration is the automated boat racer below.


The agent has learned to boat in a donut-shaped route, rather than finishing the racetrack. In this way, it is always right on time to gain bonus points. In this example, the rewarding policy did not sufficiently encourage the agent to finish the race. Another funny illustration is about a computer that learned to pause the Tetris game when it was close to losing. The time it gained by delaying a defeat yielded the program a higher reward than to keep on playing.


Another problem with the use of reinforcement learning in a real-world setting is that it could become a costly affair. Training a real car to park itself would require us to come up with a new one every once in a while. To teach a drone to avoid obstacles inside a building could also cause significant damage and might even be dangerous for those present in the building.

To tackle this problem, one could build a virtual copy of the real-world environment. This virtual copy, called a digital twin, can be used to train the agent.
By using simulation to train the agent rather than using real objects, we can greatly decrease the cost of the training process. Best case, the knowledge of the digital agent can be directly transferred to its physical counterpart.
The challenge with digital twins, however, is to build simulations that sufficiently approximate the real-world circumstances, in order to maximize their applicability. 

For example, re-creating a digital twin of your real life office space.


At ToThePoint, they are currently working on putting the use of digital twins into practice. Guided by our A.I. specialists, two students of KU Leuven’s Master in Artificial Intelligence will research how to extract ‘reinforcement learned’ knowledge from a simulation environment to a robotic arm as their Master’s thesis.


Getting started with reinforcement learning might seem hard and scary. Once you have created and trained your first artificial brain, however, the step towards building applications that are useful for your own business, becomes a lot easier. ToThePoint offers a workshop at the Applied Machine Learning Days in Lausanne (Switzerland) next January that will introduce you to reinforcement learning in practice!

Tickets available at: https://appliedmldays.org/workshops/a-conceptual-introduction-to-reinforcement-learning

Bram Vandendriessche

Leave a Reply