# What is Multicollinearity? How can you solve it?

Data Science Interview QuestionsCategory: Data ScienceWhat is Multicollinearity? How can you solve it?

What is collinearity and what to do with it? How to remove multicollinearity?
Collinearity/Multicollinearity:
– In multiple regression: when two or more variables are highly correlated
– They provide redundant information
– In case of perfect multicollinearity: β=(XTX)1XTyβ=(XTX)−1XTy doesn’t exist, the design matrix isn’t invertible
– It doesn’t affect the model as a whole, doesn’t bias results
– The standard errors of the regression coefficients of the affected variables tend to be large
– The test of hypothesis that the coefficient is equal to zero may lead to a failure to reject a false null hypothesis of no effect of the explanatory (Type II error)
Remove multicollinearity:
– Drop some of affected variables
– Principal component regression: gives uncorrelated predictors
– Combine the affected variables
– Ridge regression
– Partial least square regression
Detection of multicollinearity:
– Large changes in the individual coefficients when a predictor variable is added or deleted
– Insignificant regression coefficients for the affected predictors but a rejection of the joint
hypothesis that those coefficients are all zero (F-test)
– VIF: the ratio of variances of the coefficient when fitting the full model divided by the variance of the coefficient when fitted on its own
– rule of thumb: VIF>5VIF>5 indicates multicollinearity
– Correlation matrix, but correlation is a bivariate relationship whereas multicollinearity is multivariate
Source