Basics of Reinforcement Learning

Basics of Reinforcement Learning

Table of contents

No heading

No headings in the article.

In the previous blog we discussed about :

  • Introduction to Machine learning

  • Types of machine learning

But previously only 2 conventional types of machine learning types were discussed and that were supervised and unsupervised machine learning, but there is another type of machine learning which is very much different from these 2 types and that is Reinforcement learning,

So today you will learn about reinforcement learning, but before diving deep into this type let me give you a short recap of when we use supervised and unsupervised machine learning. Now in order to figure out when to implement which type of machine learning, first analyze your data because if your data is labeled, means when your data is having both input ( features ) and corresponding output ( labels ) then we use the supervised machine learning and if labels are not there in our data then we use unsupervised machine learning.

Now the main question arises, What is Reinforcement Learning?

image.png If you think this general definition is little difficult to understand , then don't worry let me explain you what you read above . As the definition defines that " Reinforcement learning is feedback-based machine learning , so from this line we got to know that in reinforcement learning there will be some feedback ( that will be in form of reward or punishment ) which will be given to the agent ( which is basically the entity that we suppose will take action ) everytime our agent will take some action in the environement.

If reinforcement learning is feedback learning , then do we need data for training our machine learning model ?

Thing which makes Reinforcement learning different from conventional types is an absence of data for training the machine learning model because our model will itself get trained through the reward it will get for the desired output and the punishment for the undesired output

Let us consider this example image.png

In the above illustration, we have our grid system with 12 blocks and the S4 block is the one where we want our agent ( that is robot in this case ) to reach. Now since it is reinforcement learning so we will not provide any prior data to train our robot to reach block S4 efficiently, thus when our robot will take the first action from the present position to let say block S6 which is not the right path, we will give our robot negative feedback.

In the same way next time our robot will again move to some block, let say this time to S10 which is the right path so we will give a positive feeback . In this way with trial and error our robot will reach to the desired output by taking the guidance from the feedback it recieves , but there is one problem and that is what about the efficiency because in order to reach to block S4 from S9 there are 2 paths

image.png

How to increase efficiency of Reinforcement based ML model ?

So the way we improve the efficiency of the robot is by using an algorithm, referred to as Generative Action Selection through Probability (GRASP), which improve the exploration in reinforcement learning by reshaping the environment also known as exploration space to limit the choice which robot will have to make .

In the next blog we will take a look at different algorithms used in reinforcement learning.

Did you find this article valuable?

Support Yuvraj Singh by becoming a sponsor. Any amount is appreciated!