Gradient Boosting - Part 2
Now that we have a high-level understanding of how decisions trees operate, let’s learn how they are combined in order to make the model more accurate.
Ensemble Learning uses several machine learning models to build a more efficient learning algorithms to improve the accuracy of the prediction.
With this approach multiple weak learners are combined to improve our models results.
Rather than just relying on one decision tree and hoping we made the right decision at each split, ensemble methods allow us to take a group of decision trees into account, calculate which features to use at each split, and make a final predictor based on the aggregated results of the sampled decision trees.
In supervised machine learning, regardless of the approach we choose, classification or a regression problem, the choice of the model is extremely important to have any chance at obtaining the best results. For a model, we want to have low bias and low variance, but that is very difficult to achieve. Therefore, we identify a point where the difference between bias and variance is very low. Let’s define these terms. When we train a model on training data the error between predicted value and actual value is termed as bias.
Bias refers to the error due to overly simplistic assumptions or faulty assumptions of the algorithm.
Bias results in under-fitting the data. A high bias means our learning algorithm is missing important trends amongst the features.
Variance is opposite of bias. Variance refers to the error due to overly complex solutions. This happens when the model tries to fit the training data as closely as possible. With high variance the model’s predicted values are extremely close to the actual values from the training dataset. The algorithm copies the training data’s trends and this results in loss of generalization. High variance gives rise to overfitting.
In machine learning terminology, underfitting means that a model is too general, leading to high bias, while overfitting means that a model is too specific, leading to high variance. When training a model, it is important to balance these two. Since you can’t realistically avoid bias and variance altogether, this is called the bias-variance tradeoff.
In ensemble learning, we assemble groups of weak learners. These weak learners are models that can be used as building blocks for designing more complex models. The weak learner used in XGBoost is the decision tree. These base models or weak learners don’t perform as well by themselves either because they have a high bias or because they have too much variance. In order to achieve the bias variance tradeoff, the weak learners are combined to create ensembles that achieve better performance.
Most of the time we use a homogenous base learner. That simply means the same model will be used to create the ensemble with and in our case those base learners are decision trees.
These algorithms are often referred to as meta-algorithms. Meta-algorithms are models that combine several machine learning techniques into one predictive model in order to decrease the variance (bagging) or bias (boosting).