In the first part of this series, I explained how to produce and select different models using ensemble learning. Now, it’s time to show how we can improve the ensemble performance by adjusting the way it evaluates previous models. We have 4 main methods to perform this: Weighted voting, Stacking, Bagging, and Boosting.


Weighted voting

This one is pretty simple. If you observe that a model performs better than others, you can increase its importance for the final prediction just counting its vote more times (if your task is a regression, you can implement this technique by calculating a weighted average of the results). Finding the best weight for each model is up to you! Usually, it’s very straightforward to find a good combination. This technique is also known as a “committee” or an “ensemble averaging”, and it is widely used for neural network models. You can aggregate a group of networks with low bias and high variance into a new network with low bias and low variance, as a solution for the bias-variance dilemma.

Source: Bruno Cestari



Stacking multiple machine learning (ML) algorithms, also known as a “stacked generalization”, is a way to combine several models using another ML algorithm. Rather than just counting votes, you can create another model by using the outputs from the previous models as an input. This method has the advantage to increase performance in some cases, but it comes with a price: you have another model to fit, adjust parameters, prevent overfitting and test.

Source: Bruno Cestari



The name “bagging” comes from “Bootstrap Aggregating”, an algorithm designed to improve the stability and reduce the variance of ML algorithms. The procedure of creating a bagging starts with the creation of multiple models using sub-samples from the original dataset chosen randomly with the bootstrap sampling method. The following step is to aggregate all models using a method, like voting or average.

One of this method’s advantages is that the sub-sampling and the training of each model are independent, so it can be done in parallel.

An example of a well-known algorithm that uses bagging is Random Forest.

Source: Bruno Cestari



Boosting is a family of meta-algorithms designed primarily to reduce bias. This technique combines a sequence of weak learners, that performs slightly better than random guessing, to create a strong one. The models are combined through a weighted majority vote or weighted average. The main difference between boosting and other committee models, like bagging and simple majority vote, is that this algorithm trains the models in sequence using the weighted data from the previous model.

An example of a model that uses this technique is AdaBoost, from which some developers won the prestigious Gödel Prize.

Source: Bruno Cestari


There is a complete area of research concerning ensemble models, trying to find new ways to aggregate models to improve their performance. For example, there are some researches on differential evolution algorithms to improve the performance of ensembles (C3E-SL with D2E) [1]: algorithms – an example of these new models being developed.

If you want to go further and choose the best possible ensemble model, you can take a look at the Bayesian Optimal Classifier. It is an ensemble of all the hypothesis in the hypothesis space.

The votes of each hypothesis can be calculated proportionally by the likelihood of the dataset is a sample from a system where this hypothesis is true. You can describe this estimator as[2]:



Where the predicted class is 



Y = {y1, y2 … yn} is the set of all possible classes,

H = {h1, h2 … hn} is the hypothesis space

and D is the dataset.

Even though no ensemble can outperform this one in theory, calculating this estimator can be hopelessly inefficient. Therefore, it is not worth using it in many practical cases.

Summarizing, ensemble methods are algorithms that combine several other ML algorithms to create a final model to decrease the variance, bias,or improve the predictions. The most used techniques today are Stacking, Bagging, and Boosting, but there are a lot of other manners to create an ensemble model.[3]


Featured Image by Markus Spiske on Unsplash


About the author

Bruno Cestari is a Software Engineer at Poatek.






[1] Oza, Nikunj & Tumer, Kagan. (2008). Classifier ensembles: Select real-world applications. Information Fusion. 9. 4-20. 10.1016/j.inffus.2007.07.002.

[2] Tom M. Mitchell, Machine Learning, McGraw-Hill Science/Engineering/Math; (March 1, 1997).

[4] Coletta, Luiz & Hruschka, Eduardo & Acharya, Ayan & Ghosh, Joydeep. (2014). A Differential Evolution Algorithm to Optimise the Combination of Classifier and Cluster Ensembles. International Journal of Bio-Inspired Computation. 10.1504/IJBIC.2015.069288.

[3] Vadim Smolyakov, (2017, August 22) Ensemble Learning to Improve Machine Learning Results. Retrieved from in 2020, March 06.