What is collinearity and what to do with it? How to remove multicollinearity?

**Collinearity/Multicollinearity:**

– In multiple regression: when two or more variables are highly correlated

– They provide redundant information

– In case of perfect multicollinearity: β=(XTX)−1XTyβ=(XTX)−1XTy doesn’t exist, the design matrix isn’t invertible

– It doesn’t affect the model as a whole, doesn’t bias results

– The standard errors of the regression coefficients of the affected variables tend to be large

– The test of hypothesis that the coefficient is equal to zero may lead to a failure to reject a false null hypothesis of no effect of the explanatory (Type II error)

– Leads to overfitting

**Remove multicollinearity:**

– Drop some of affected variables

– Principal component regression: gives uncorrelated predictors

– Combine the affected variables

– Ridge regression

– Partial least square regression

**Detection of multicollinearity:**

– Large changes in the individual coefficients when a predictor variable is added or deleted

– Insignificant regression coefficients for the affected predictors but a rejection of the joint

hypothesis that those coefficients are all zero (F-test)

– VIF: the ratio of variances of the coefficient when fitting the full model divided by the variance of the coefficient when fitted on its own

– rule of thumb: VIF>5VIF>5 indicates multicollinearity

– Correlation matrix, but correlation is a bivariate relationship whereas multicollinearity is multivariate

Source

1 Answers

Your Answer