Linear Regression
The linear regression establishes a relationship between the dependent variable (y) and one or more independent variables (x) using a best-fit straight line. This means relationship between the dependent and independent variables is linear in nature.
Different ways to solve Linear Regression
-
Normal Equation: The normal equation is a closed-form solution for linear regression (Gaussian Elimination). It finds the value of the regression coefficients that minimizes the sum of the squared residuals.
-
Gradient Descent: Gradient descent is an iterative optimization algorithm that can be used to solve linear regression. It works by finding the minimum of a cost function, which is typically the sum of the squared residuals.
-
Singular Value Decomposition: The SVD is used because the data matrix is non-invertible hence we calcute its pseudo-inverse using SVD this is a great artricle explaning the same Visit.
Checks to apply before applying Linear Regression
- Linearity: Linear relationship between dependent and independent variables that can be done using person correlation coefficient not the spearman correlation coefficient.
- Normality: The residuals are normally distributed. This means that the residuals follow a bell-shaped curve, with most of the values clustered around the mean. The normality assumption is necessary because it allows us to use the standard techniques of statistical inference, such as hypothesis testing and confidence intervals. Violations of normality can lead to biased and inefficient estimates, and incorrect conclusions about the statistical significance of the independent variables.
- Homoscedasticity: The distribution of residual should be same for all the values of independent variable because if the errors are dependent on the value of independent variable then the model is uncertain as some inputs and is certian at some inputs. This does not provide us a concrete prediction hence the model is not very reiable.
- No multicollinearity: There is no high correlation between the independent variables. This means that the independent variables are not too closely related to each other. If there is high correlation between the independent variables, it can be difficult to separate out their individual effects on the dependent variable, leading to unstable estimates of the regression coefficients.
How to deal with multicollinearity?
- Remove one of the correlated variables: The simplest way to deal with multicollinearity is to remove one of the highly correlated variables from the regression model. The downside of this approach is that it reduces the degrees of freedom of the model, which can weaken the statistical power of your analysis.
- Combine the correlated variables: Another way to deal with multicollinearity is to combine the correlated variables together to form a single predictor. For example, if you had two highly correlated variables, you could combine them together to form a single predictor by taking their average.
- Use principal components: Principal components analysis (PCA) is a dimension reduction technique that can be used to reduce a large set of variables to a small set that still contains most of the information in the large set. This technique is useful when you have a large number of correlated predictors, and you want to summarize them with a smaller set of representative variables.
- Use regularization methods: Regularization methods, such as ridge (L2 regularization) regression and lasso regression( L1 Regularization), are powerful techniques that are designed to deal with multicollinearity by constraining the size of the regression coefficients. These methods work well when you have a large number of correlated predictors.
- Do nothing: If your goal is to make predictions, and not to understand the role of each individual variable, then multicollinearity might not be a problem. Multicollinearity only affects the interpretation of your model if you care about the specific role of each variable. However, multicollinearity does affect the precision of the estimated regression coefficients, which can cause your predictions to be less reliable.
- Use Partial Least Squares Regression: Partial least squares regression (PLS regression) is a regression method that is an alternative to ordinary least squares (OLS) regression. PLS regression is useful when you have a large number of correlated predictors, and you want to use them to predict an outcome, but you also want to reduce the number of predictors in your model. PLS regression is similar to principal components regression, but the key difference between the two methods is that PLS regression uses the response variable in the dimension reduction step, while principal components regression does not.
What is the difference between L1 and L2 regularization?
L2 regularization is also known as ridge regression. L1 also known as Lasso. L1 regularization promots shrinkage of coffecient becuase of the distance metrics property. Lets say we knew the optimal loss with not for that we would have diffrerent choices for paramters along the counter of the loss surface but when we add L1 which is just some of absolute values of the coffecitend. And the counter of L1 are squares so the extreme points are corners which will intersect with the loss function counters. Visit