What is Bagging, Bootstrapping, boosting and stacking in machine learning?

Data Science Interview QuestionsCategory: Data ScienceWhat is Bagging, Bootstrapping, boosting and stacking in machine learning?
2 Answers
MockInterview Staff answered 6 years ago

Answer by Prasad Seemakurthi, Data Scientist at EDF Trading:

With the proliferation of ML applications and increasing in Computing power (thanks to Moore’s law) some of the algorithms implements bagging and/or boosting inherently for Example CRAN – Package ipred implements Bagging for both classification and regression.
I believe Shubham Mankodiya answer is same as this answer on stack exchange Bagging vs boosting Answer.
First, let me explain what Bagging and Boosting is and then delineate the differences .Both Boosting and Bagging are ensemble methods and meta learners
Boosting Steps :

  1. Draw a random subset of training samples d1 without replacement from the training set D to train a weak learner C1
  2. Draw second random training subset d2 without replacement from the training set and add 50 percent of the samples that were previously falsely classified/misclassified to train a weak learner C2
  3. Find the training samples d3 in the training set D on which C1 and C2 disagree to train a third weak learner C3
  4. Combine all the weak learners via majority voting.

Bagging :
Before understand Bagging lets understand the concept of Bootstrap which is nothing but choosing a Random sample with replacement.
As everyone pointed Bagging is nothing but Bootstrap AGGregatING

  1. Generate n different bootstrap training sample
  2. Train Algorithm on each bootstrapped sample separately
  3. Average the predictions at the end

One of the Key differences is the way how use sample each training set. Bagging allows replacement in bootstrapped sample but Boosting doesn’t.
In theory Bagging is good for reducing variance( Over-fitting) where as Boosting helps to reduce both Bias and Variance as per this Boosting Vs Bagging, but in practice Boosting (Adaptive Boosting) know to have high variance because of over-fitting
Source
[Also, see Bootstrap (Used in Random Forest). TL;DR:  Bootstrapping is a sampling technique and Bagging is an machine learning ensemble based on bootstrapped sample. Source ]

Your Answer

10 + 12 =