Linear Regression

This Section is divided into 3 parts, Part 1 contains a small introduction about the Algorithm, Part 2 has the project based on the Algorithm and Part 3 covers the most Popular Questions on the Algorithm

PART 1

Linear regression is a type of algorithm used when we want to predict numeric outcomes. The “linear” part refers to how it looks for a relationship between two things: the thing we want to predict (called the dependent variable) and the things we think might influence it (called the independent variables). For example, if we’re trying to predict someone’s salary based on their experience and education, experience and education are the independent variables, and salary is the dependent variable.

There is an increase in Salary with increase in Experience in years, this is an example of Linear Relationship.

How does the algorithm work?

In our example, the data points might not form a perfectly straight line, but linear regression finds the best-fitting line that comes as close as possible to all the points. It does this by minimizing the overall distance between the line and the points. There are two main techniques for finding this line: Gradient Descent and Ordinary Least Squares (OLS). OLS is easier to understand, so let’s focus on that.

OLS (Ordinary Least Squares): OLS stands for ordinary least squares. In this technique, we find the best-fitting line by minimizing the squared vertical distance between the line and each point.

The green color line is the best Fitting Line.

Types of Linear Regression:

If we have only one independent variable, like experience in our example, it’s called a Simple Regression Problem. If we have more than one, like experience and education, it’s called a Multiple Linear Regression Problem.

Assumptions in Linear Regression:

For our predictions to be the most accurate, certain assumptions need to hold true:

Linear Relationships: All the independent variables should have a linear relationship with the dependent variable.
Multicollinearity: Independent variables shouldn’t be too closely related to each other. If they are, we might need to remove one. We can check this using a method like VIF. (Checkout the project for Code)
Homoscedasticity: This means that the spread of our predictions (residuals) should be about the same across all levels of the independent variables.

Absolute Residual= | Y_predicted– Y_test| , means difference between the actual value and our model’s predicted value.

There are other assumptions, but we’ll discuss those in more detail later.

How to Check Model Accuracy:

To see how accurate our model is, we typically use two main parameters:

MAE (Mean Absolute Error): This is the average of the absolute differences between our predicted values and the actual values.

R2 Score: R2 tells us how well our model fits the data. It gives us a percentage of the variation in the dependent variable that’s explained by the independent variables. While it doesn’t directly tell us accuracy, it helps us understand how our model is performing.

By understanding these concepts and checking these parameters, we can assess how well our linear regression model is working.

PART 2

This section covers projects on linear regression, with datasets available on GitHub.

Simple Linear Regression: –

Github: Click Here

Pdf: Click Here

Multiple Linear Regression: –

Github: Click Here

Pdf: Click Here

PART 3

This section has the important questions on Linear Regression.

Click Here