2 Answers
Tips:
- You can break this question into two components. Model selection & variable selection — typically this question is asked to get a high-level understanding and the interviewer may follow-up with questions to dive deeper
- Model selection depends on what’s more important: Accuracy, Interpretability & computation time. If you need accuracy, you might want to try lots of algorithms and see which works best on your data. if Interpretability is important then you can do something simple as linear regression or decision trees which are easy to interpret (and are not black box like neural network). If speed is important then you can’t do SVM for instance.
- Feature selection: See other answers on this topic.
Answer from Analytics Vidhya:
While working on a data set, how do you select important variables? Explain your methods.
Answer: Following are the methods of variable selection you can use:
- Remove the correlated variables prior to selecting important variables
- Use linear regression and select variables based on p values
- Use Forward Selection, Backward Selection, Stepwise Selection
- Use Random Forest, Xgboost and plot variable importance chart
- Use Lasso Regression
- Measure information gain for the available set of features and select top n features accordingly.
Your Answer