How do you handle missing data? What imputation techniques do you recommend?

Data Science Interview QuestionsCategory: Data ScienceHow do you handle missing data? What imputation techniques do you recommend?
2 Answers
MockInterview Staff answered 5 years ago
  • If data missing at random: deletion has no bias effect, but decreases the power of the analysis by decreasing the effective sample size
  • Recommended: Knn imputation, Gaussian mixture imputation

Source

MockInterview Staff answered 5 years ago

Answer from Analytics Vidhya:
You are given a data set consisting of variables having more than 30% missing values? Let’s say, out of 50 variables, 8 variables have missing values higher than 30%. How will you deal with them?
Answer: We can deal with them in the following ways:

  1. Assign a unique category to missing values, who knows the missing values might decipher some trend
  2. We can remove them blatantly.
  3. Or, we can sensibly check their distribution with the target variable, and if found any pattern we’ll keep those missing values and assign them a new category while removing others

Your Answer

7 + 17 =