Mathematics behind Linear Regression for Beginners

Abhishek Kathuria
Apr 26, 2020
4 min read

Hi everyone!

This is my very first blog. I have been wanting to write blogs for a long time now but I was always caught up in this dilemma -” Is this the ideal time?” or “Am I qualified enough?”. I have always wanted to give something back to the community.

After working for over two years on various Machine Learning and Natural Language Processing projects and writing six research papers, I have realized that most of us skip the most important part, that is, the mathematical intuition behind a machine learning concept. Now, I finally feel that now I am in the position to contribute something through the series blog posts where the main focus will be on the researching and the mathematical intuition behind it. So, this will be an intuitive mathematical insight into the most common machine learning concept, that is, Linear Regression!

Linear regression

It is a supervised machine learning algorithm where the value of the dependent variable is predicted for one or more independent variables (features) based on the best fit line.

What are Dependent and independent variables?

Dependent variables are those variables whose values have to be predicted. A dependent variable may also be known as the output. The predicted value or the dependent variable is based on the values of one or more independent variables.

On the other hand, independent variables are those variables whose values are fixed and not dependent on any other variable. They may be known as input or features. There can be one or more independent variables that can help in the determination of the dependent variable.

For example, let us take a scenario of the ‘House Price Prediction’ problem. In this problem, the price of the house has to be predicted based on the given size of the house.

Let x be the size of the house and y be the Price of the house. Here,x is the independent variable or a feature whereas y is the dependent variable as it’s value has to be predicted.

The following is the sample data provided:

Table 1

First, let us get an idea about the best fit line!

What is the Best Fit Line?

It is a straight line on the graph where the distance of each of the plotted points from that line is minimum.

The equation of any best fit line is given as:

Y`=mx+c

Here, m is the slope of the line, and c is the point where the line cuts the y-axis.

Let us now use the above table 1 to plot the points.

In the above graph 1, the plotted points are indicated in blue colour. Let the following straight-line represent the best fit line as indicated by red colour.

We also assume that the equation of the best fit line is given as follows:

Y`=20 x

This means that the value of m is taken as 20 and c is taken as 0.

Since the linear regression is used for prediction, we will predict the price of the house is the given value of the size of the house is 60.

Using the above equation of the line, we can easily see that when x is 60, then Y` will be 1200.

Cost Function

It is the function which minimizes the sum of all the distances of the plotted point from the best fit line.

The cost function for a particular slope m is given by the following equation:

Here, m is the slope of the line, n is the number of points, Y`ᵢ is the predicted value on the best fit line and yᵢ is the given value of the dependent variable for the given x.

Explanation of the cost function with an example

Let us consider a sample dataset:

Table 2

Now, if we plot the data points given in table 2 and draw a line, we get the following graph:

Here, the straight line (indicated by red colour) is given by the following equation:

Y` =mx+c

If we consider c=0, then the equation becomes Y`=mx

Case 1) When m=1:

Y`(1)=1×1 =1, Y`(2)=1×2 = 2, Y`(3)=1×3 = 3, Y`(4)=1×4 = 4

As we from equation 3 that

Hence, the cost function when m is 1 will be given as follows:

Case 2) When m=2:

Y`(1)= 2×1= 2, Y`(2)= 2×2= 4, Y`(3)= 2×3= 6, Y`(4)= 2×4= 8

Hence, the cost function when m is 2 will be given as follows:

Similarly, we will calculate the cost function for all the different values of m. The best line will be determined whose cost function will be the minimum, in this case, it is for m=1. Hence the value of the slope, m is chosen as 1 for this kind of sample data.

Now, the question arises for how many values of the slope m should we calculate the value of cost function. This will be explained through a concept known as Gradient Descent which will be covered in my next post. So stay tuned!

If you find this article useful and want to be a part of my machine learning journey, do like, share and subscribe to my website. If you have any suggestions, kindly let me know in the comments. Thank you :)

Abhishek Kathuria