I hope you are doing great so recently I wrote some blogs related to supervised machine learning and as right now I am disvoering the world of reinforcement learning so I thought why not to share what I have learned about it in short.

Introdutction to Reinforcement learning

Most of us are aware about 2 conventional types of machine learning types supervised and unsupervised machine learning, but there is another type of machine learning which is very much different from these 2 types and that is Reinforcement learning,

So today you will learn about reinforcement learning, but before diving deep into this type let me give you a short recap of when we use supervised and unsupervised machine learning. Now in order to figure out when to implement which type of machine learning, first analyze your data because if your data is labeled, means when your data is having both input ( features ) and corresponding output ( labels ) then we use the supervised machine learning and if labels are not there in our data then we use unsupervised machine learning.

Now the main question arises, What is Reinforcement Learning?

Example to solidfy the understanding

If you think this general definition is little difficult to understand , then don't worry let me explain you what you read above . As the definition defines that " Reinforcement learning is feedback-based machine learning , so from this line we got to know that in reinforcement learning there will be some feedback ( that will be in form of reward or punishment ) which will be given to the agent ( which is basically the entity that we suppose will take action ) everytime our agent will take some action in the environement.

If reinforcement learning is feedback learning , then do we need data for training our machine learning model ?

Thing which makes Reinforcement learning different from conventional types is an absence of data for training the machine learning model because our model will itself get trained through the reward it will get for the desired output and the punishment for the undesired output. Let us consider this example

In the above illustration, we have our grid system with 12 blocks and the S4 block is the one where we want our agent ( that is robot in this case ) to reach. Now since it is reinforcement learning so we will not provide any prior data to train our robot to reach block S4 efficiently, thus when our robot will take the first action from the present position to let say block S6 which is not the right path, we will give our robot negative feedback.

In the same way next time our robot will again move to some block, let say this time to S10 which is the right path so we will give a positive feeback . In this way with trial and error our robot will reach to the desired output by taking the guidance from the feedback it recieves , but there is one problem and that is what about the efficiency because in order to reach to block S4 from S9 there are 2 paths

How to increase efficiency of Reinforcement based ML model ?

So the way we improve the efficiency of the robot is by using an algorithm, referred to as Generative Action Selection through Probability (GRASP), which improve the exploration in reinforcement learning by reshaping the environment also known as exploration space to limit the choice which robot will have to make .In the next blog we will take a look at different algorithms used in reinforcement learning.

Short note

So this was a very quick introduction to reinforcement learning and in the next coming blogs we will start off with more fundamental concepts and terminologies related to reinforcement leanring. Also if you have any kind of suggestions then let me know in the comments it would really help me a lot.

💡

Also I would love to connect with you, so here is my Twitter and linkedin

What is Reinforcement Learning? A 3-Minute Overview

Table of contents

Introdutction to Reinforcement learning

Example to solidfy the understanding

Short note

What is Reinforcement Learning? A 3-Minute Overview

Table of contents

Introdutction to Reinforcement learning

Example to solidfy the understanding

Short note

Did you find this article valuable?