Simple Linear Regression Intuition

simple linear regressionYou may recognize the equation for simple linear regression as the equation for a sloped line on an x and y axis.

Simple linear regression involves a dependent variable. This is an outcome you want to explain. For example, the dependent variable could represent salary. You could assume that level of experience will impact salary. So, you would label the independent variable as experience.

The coefficient can be thought of as a multiplier that connects the independent and dependent variables. It translates how much y will be affected by a unit change in x. In other words, a change in x does not usually mean an equal change in y.

In this example, the constant represents the lowest possible salary. This would be the starting salary for someone with a zero level of experience.

regression line of best fitSuppose you have data from a company’s employees. You could plot the data with the red marks as shown above. Then would draw a line that “best fits” the data. It’s impossible to connect all the marks with a straight line, so you use a best fitting line. The equation for this line would be the result of your simple linear regression. The regression finds the best fitting line. This line is your regression model.

How does Simple Linear Regression find the best fitting line?

The regression model is found by using the ordinary least squares method. Please refer to the following illustration.

ordinary least squares method First, look at the notation. Notice that the red mark is actual data, and the green mark is the predicted model. For these red and green marks in the boxed-frame, the actual salary is higher than what was predicted in the model. So, this employee makes a higher salary than what the model predicts. Of course, there are other variables besides experience that can affect salary. But in this case, we keep it simple. Hence the term, simple linear regression.

What the regression analysis does is take the sum of all the squared differences between the actual value and the predicted value. The analysis requires this to be done for the many different lines that can “fit” through the data. The line that has the minimum sum of squared differences compared to the other lines is the best fitting line. The equation for this line represent your simple linear regression model.

A detailed example of a regression model in the R programming language may help to understand this concept.

Leave a Reply

Your email address will not be published. Required fields are marked *