Simple Linear Regression

3 min readAug 9, 2021

A linear regression model is used to predict the value of a variable based on the value of another variable. The variable you want to predict (‘Y’) is referred as Target or Output variable, the variable used to predict the value of Y i.e X is referred to as independent or predictor variable.

The equation of a linear model resembles that of a straight line

Y is the target variable; β1 is the slope; β0 is the intercept; e is the error (residual).

In order to build a linear model between two variables X and Y; it is important that they’ve a linear relationship.

In the above plot, there exists a positive linear relationship between X and Y i.e., as the value of X increases Y increases as well.

The objective of building a linear model is to fit in a line through the data points to estimate or predict the value of Y for a given value of X .

Residuals

When fitting a line through the data points, some data points do not lie on the regression line. The difference between the actual and predicted value is termed as Residual.

Residuals may be positive or negative.

a. Positive Residual — when the actual value is above the predicted line

b. Negative Residual — When the actual value is below the predicted line.

Line of Best Fit

If error term for i-th data point is denoted as E(i) = Actual value (i) — Predicted Value(i). For ’n’ data points the sum of the square of residuals is referred to as RSS or the Residual Sum of Squares. The regression line is chosen such that the Residual Sum of Square is minimized.

Assumptions of a Linear Model

a. The target and the independent variables are linearly dependent

b. Error lines are normally distributed

Error terms follow a normal distribution with mean equal to zero.

c. Error terms are independent of each other

Example - time-series data where the next value is dependent on the previous.

d. Error terms have a constant variance (homoscedasticity)

Variance should not increase (or decrease) or follow a pattern as the error values change.

Factors that determine the goodness of Fit

a. R Squared

A model with high value of R squared is considered to be a good fit.

b. F- Statistic-

value is less than 0.05; it is assumed that the overall fit of the model is significant.
value greater than 0.05; the model needs to be reviewed as the fit may be by chance.