Types of Classification in Machine Learning: A Guide for Beginners

Hello machine learning enthusiasts , So today we will dive deep into classification in machine leanring , and here are the things which we will discuss in this blog post

  1. What is classification in Machine Learning ?

  2. What are the types of classification ?

  3. Common terminologies we use while doing the classification in ML

  4. Algorithms used to do classification So after this blog post I can assure you that you will be equipped with a solid knowledge related to classification in Machine learning , so without any due let's start with our very first question which must come in our mind .

What is classification in Machine learning?

First of all before going straight to the general definition of classification in machine learning let me give you a short recap of type of machine learning using which we can do classification .

Overall classification is basically a type of problem which we can solve using supervised machine learning, which is simply a type of machine learning which is used in case the data which we have is labeled data, which means for every feature or group of features there is some particular output called as a label attached to it. Now let's get straight to the general definition that you could write in your exams

image.png

Now since you know what is classification in machine learning, let us now take a look at some types of machine learning?

Types of Classification in Machine Learning

Overall there are 4 types of classification in machine learning and all these types first 3 types of classification are mostly used but you must also be aware about the 4th type of classification

  1. Binary Classification

  2. Multi-Class Classification

  3. Multi-Label Classification

  4. Imbalanced Classification

Now we will discuss each type of classification, and in the discussion, you will learn about how all of these types are different visually when you will plot the graph for the output.

Binary Classification: Binary classification is the most common type of classification in which the data which we will feed into our machine learning model in form of features, that data will categorize into 2 categories as an output.

binary clasi.jpg

For example: In my 2nd semester of university we made a machine learning model that could make a prediction that whether a person is prone to heart diseases or not based on some of the features like cholesterol, blood pressure etc , now in this case there could be only 2 output related to this type of problem and those would be ( Prone to diseases / Not prone to diseases ) . Now since there are only 2 categories in which our machine learning model could categorise the prediction generated by it , thus this is a problem of binary classification .

Shape.png

Are there any supervised machine learning algorithms that could be used in order to do binary classification ?

The answer to this question in YES ! , there are some popular supervised machine learning that you could use to make your next machine learning project which involves binary classification . Overall there are mainly 5 algorithms which are most commonly used

  1. Logistic Regression

  2. k-Nearest Neighbors

  3. Support Vector Machine

  4. Decision Trees

  5. Naive Bayes

Out of all the above-mentioned algorithms , there are few algorithms that are specifically designed for binary classification only, and these algorithms are Logistic Regression and Support Vector Machines.

Multi-Class Classification: Multi-Class classification in machine learning is basically a type of classification in which the output ( prediction ) that will be generated by the machine learning model can be categorized into more than 2 classes. Now carefully understand the example below .

multi_class classifiction image.jpg

Let us assume that we want to make a machine learning model that will tell us what is name of fruit which we will provide as an image or sample in our machine learning model. Now since we all know that there are not only 2-3 types of fruits instead there are multiple types of fruits , thus multiple fruits having same features can belong to multiple classes , the supporting illustration is given below

20220715_172000_0000.png

Here in the above table as you can see that for same features like ( Round and small ) there can be 2 types of fruits ( Apple or lemon ) , in the same way for a feature like the roundness there are 4 types of fruit thus this is a problem of multi-class classification.

Just like the binary classification, there are also some algorithms that you can implement in order to make a machine learning model in which the problem statement> t is related to multi-class classification.

  1. Random Forest.

  2. k-Nearest Neighbors

  3. Support Vector Machine

  4. Decision Trees

  5. Gradient Boosting

Wait a minute, previously we said that some of the algorithms which we use for binary classification are designed specifically to do binary classification only, so why we are using algorithms like: Support Vector Machines ?!

The answer to this query is that there are some algorithms used in binary classification that can not be directly used for doing the multi-class classification but can be modified to do so. And there are only 2 such types: ( Logistic Regression and Support vector machine )

But how can we do it, like how this is possible?

The answer to this query is the binary classification algorithms can be used for multi-class classfication problem by using 2 different type of strategies named as :

  1. One vs Rest

  2. One vs One

Both of these 2 strategies will be discussed in the next blog so right now , let's jump straight into the remaining 2 types .

Multi-Label Classification: Multi-label classification is basically a more advanced version of the multi-class classification, because in this case sample data that we will provide to our machine learning model, that sample data will contain the data belonging to more than 2 labels. Let us take a look at the example below to get things more understandable

For example, We are making a machine learning model that will predict the name of fruit based on some features, but the thing is that the sample data also known as unseen data which we will be providing to our machine learning model as an output will contain fruits but in the refrigerator, now we all know that in the refrigerator there can be vegetables, milk, water bottles, etc and all of these things belong to different labels, so in this case, the classification for our fruits will belong to multi-label classification.

Just like other classification , which algorithms do we use for doing the multi-label classification ?

Shape (1).png

Basically the classification algorithms which we used for doing the binary classification or multi-class classification , we can use them but with we will use specialised version of those algorithms , which are mentioned below

  1. Multi-label Decision Trees

  2. Multi-label Gradient Boosting

  3. Multi-label Random Forests

Imbalanced Classification : Imbalanced classification is just that type of classification in which we have imbalanced data samples for each class.

imbalanced.jpg

For example : Let us assume that we want to make a machine learning to detect spam emails , but the data set which we have is having imbalanced number of sample for classes ( SPAM / NO SPAM ) , here by imbalanced samples I mean that the total number of SPAM samples are way more than the NO SPAM samples .

Thus these type of problems can be solved using binary classification but with some additional techniques that would change the composition of samples in the training dataset by undersampling the majority class or oversampling the minority class . The names of these techniques are :

  1. SMOTE Oversampling.

  2. Random Undersampling.

Also inorder to pay more attention to the minority class when fitting the model on the training dataset, we can use some cost-sensitive machine learning algorithms some of which are

  1. Cost-sensitive Support Vector Machines.

  2. Cost-sensitive Decision Trees.

  3. Cost-sensitive Logistic Regression.

Short note

I hope you good understanding of what are decision trees, what is classification in machine learning and what are the types of classification. So if you liked this blog or have any suggestions kindly like this blog or leave a comment below it would mean a to me.

Also I would love to connect with you, so here is my Twitterand linkedin

Did you find this article valuable?

Support Yuvraj Singh by becoming a sponsor. Any amount is appreciated!