What is a decision tree?

1 Answers
MockInterview Staff answered 7 years ago
  1. Take the entire data set as input
  2. Search for a split that maximizes the “separation” of the classes. A split is any test that divides the data in two (e.g. if variable2>10)
  3. Apply the split to the input data (divide step)
  4. Re-apply steps 1 to 2 to the divided data
  5. Stop when you meet some stopping criteria
  6. (Optional) Clean up the tree when you went too far doing splits (called pruning)

Finding a split: methods vary, from greedy search (e.g. C4.5) to randomly selecting attributes and split points (random forests)
Purity measure: information gain, Gini coefficient, Chi Squared values
Stopping criteria: methods vary from minimum size, particular confidence in prediction, purity criteria threshold
Pruning: reduced error pruning, out of bag error pruning (ensemble methods)

Your Answer

7 + 20 =