Exploring Hypothesis Testing in AI: With Real-world Example

ยท

6 min read

Exploring Hypothesis Testing in AI: With Real-world Example

Photo by Dollar Gill on Unsplash

If you are looking for some resource that could give you basic intuition behind hypothesis testing in statistics and how it is used in the field of data science or machine/deep learning, then let me tell you that you are at the right place, because after going through this blog I can assure that you will be completely clear in your mind about the crusk of hypothesis testing.

Now to start, we will not jump directly onto what is hypothesis testing instead we will first try to understand what is hypothesis, why do we make a hypothesis ?. After tackling with these foundational quesstions we will then move on to hypothesis testing and some other things related to it.

What is a hypothesis?

In the context of statistics, hypothesis can be defined as a claim or assumption about an unknown population parameter.

Why do we make a hypothesis ?

For understanding why do we make a hypothesis instead of actually finding out the value of unknown population parameter, you first must be aware about the concept of population and sample.

Population is like a superset which contains all the data points in which we are interested in to study about, whereas sample is a subset of population which is used to represent population in the best possible way.

Population and Sample Statistic - GeeksforGeeks

Now agian going back to question of why do we make hypothesis and not find the exact value of unknown population parameter, let us take a look at the example and by the end of example you yourself will be able to answer the question.

Example to better understand

Let us assume that we are conducting a survery about the average height of college going boys in India. Now in this scenario our population (superset) would be all the college going boys, and the unknown population parameter will be average height of all boys. Now in order to find the value of unknown population parameter we could either consider noting down the hieghts of all the boys and then dividing the result by total number of observations, but the problem is that this method is not feasible because there can be hundred of colleges and thousands of college gonig boys in those colleges and noting down the heights of all the boys will be very much time consuming, so instead of actually finding the average height of by considering the entire population we simpy make a assumption about the unknown population parameter.

But since our assumption be wrong so in order to validate our assumption about the unknown population parametere we take a sample from the population and using that sample data we test our assumption.

๐Ÿ’ก
In our case the sample data could be boys in single college, considering popuation to be all the college going boys in India.

So we can conclude that we make a hypothesis about unknown population parameter when finding out the exact value of population paramter is not feasible.

What is hypothesis testing?

In simpler terms, hypothesis testing is a process of validating our claim or assumption about the unknown population parameter using sample data.

Now since you are aware about what is hypothesis testing, you must also be aware about the types of hypothesis.

Types of hypothesis

There are 2 different types of hypothesis, null hypothesis and alternative hypothesis. The difference between both of these hypotheis are ๐Ÿ‘‡

Null hypothesisAlternative hypothesis
Foundational differenceNull hypothesis is simply a claim which says that there is no difference between sample and population parameter and it is assumed to be true until not prooven false.Any hypothesis or claim which is contradictory to the null hypothesis is called alternative hypothesis
NotationHoHa, H_1

According to a famous quote "Examples are not just illustrations; they are the lifeblood of understanding". So we will also follow the this quote and we will try to understand the difference between null hypothesis, alternative hypothesis and also we will take a look how to do hypothesis testing.

๐Ÿ’ก
Suppose a company is evaluating the impact of new training program on the productivity of its employees. The company has data on the average prouctivity of its employees before implementing the training program. The average productivity of its employees before implementing training program was 50 unit with 5 units of standard deviation. After implementing the training program, the company measures the productivity of random sample of 30 employees. The sample has an average productivity of 53 untis per day. The company wants to know if the new training program has shignificantly increased producitivity ?

Let's do a hypothesis testing ( Example )

According to the question, first we will note down the all the information provided

  • Population mean () = 50

  • Population standard deviation () =

  • Sample size (n) =

  • Sample mean () = 53

๐Ÿ’ก
In this scenario our null hypothesis would be that there is no difference between the average productivity of employees before or after training program. () Keep one thing in mind null hypothesis always assume that there is no difference before or after doing something. Alternative hypotesis will be like productivity has increased.

Now before doing hypothesis testing you must be aware about the fact that there are 2 ways of perfoming hypothesis testing ( P value approach and rejection region approach ). Generally we always use rejection region approach but for this blog post we will use the rejection reagion approach because it is little easy to understand at the initial stage.

For performing the hypothesis testing using the rejection region approach there are few steps which we need to perform, so let's take a look at each step one by one ๐Ÿ‘‡

  1. Formulate null and alternative hypothesis

  2. Select significance level ( usually alpha = 5% )

  3. Check assumption/distribution of data and decide test

  4. State test statistic and calculate its value

Conduct test and reject or accept null hypothesis

Special Note

I hope you got the basic understanding of hypotheis testing and also how to conduct hypothesis testing using the rejection region approach. In the next couple of blogs we will further try to understand the things we skipped in this blog such as significance level, difference between 2 and 1 tailed test and P value approach.

๐Ÿ’ก
Also I would love to connect with you, so here is my Twitter and linkedin

null, alternative, significane level, critical region, and python code for this

Did you find this article valuable?

Support Yuvraj Singh by becoming a sponsor. Any amount is appreciated!

ย