# What are the assumptions required for linear regression? What if some of these assumptions are violated?

Data Science Interview QuestionsCategory: Data ScienceWhat are the assumptions required for linear regression? What if some of these assumptions are violated?
1. The data used in fitting the model is representative of the population
2. The true underlying relation between xx and yy is linear
3. Variance of the residuals is constant (homoscedastic, not heteroscedastic)
4. The residuals are independent
5. The residuals are normally distributed

Predict yy from xx: 1) + 2)
Estimate the standard error of predictors: 1) + 2) + 3)
Get an unbiased estimation of yy from xx: 1) + 2) + 3) + 4)
Make probability statements, hypothesis testing involving slope and correlation, confidence intervals: 1) + 2) + 3) + 4) + 5)
Note:
– Common mythology: linear regression doesn’t assume anything about the distributions of xx and yy
– It only makes assumptions about the distribution of the residuals
– And this is only needed for statistical tests to be valid
– Regression can be applied to many purposes, even if the errors are not normally distributed
Source