Finally, let's discuss one last ensemble method, namely stacking. Here our base learners can be anything. Here we see we have logistic regression, support vector machines and random forests as each of our base learners. So there's no bias here towards needing to use a decision tree. We can even pass in, as we see here, that random forest, which is an ensemble method in itself. The idea is to fit several out rhythms to our training sets and use their predictions, or their scores, of each of these individual base learners as a new training set. These become our meta features, each one of the outputs of each of these based learners. And then we pass those through to one final classifier, called our meta classifier, to come up with a single prediction. And you can think about this last step, this aggregation step, similar to what we've done with our other ensemble methods in regards to bringing together all the different votes. Now, in a way, this is very similar to bagging, but without the need to bootstrap or to limit to decision trees. Instead, we train several different algorithms. And we can think of this as testing many different assumptions on our dataset. So we're not held to just the assumptions needed for decision trees or any other model. And the output of those algorithms are, again, our new features to be fed into that final aggregation task, into that final classifier. So we see we go from our labeled data, run it through all the logistic regression, we come up with our meta features. And then we can use those meta features as input to either a voting class or one final classification method. So what would this final meta classifier look like? One option is we can do a majority vote, as we discussed, or a weighted vote, as we did with our other ensemble methods. Now, we want to note that in order to optimize the meta step parameters, so to optimize each of the parameters for our base learners, we need to be careful and scientific about our approach. What do I mean by that? I mean that we need a hold-out and a test set for our base learners as well. We can't just have a hold-out set for our final classification method. In order to properly learn the parameters for each of our base learners, we need to ensure we have a hold-out set for those as well. And with that, we want to be aware that such models can get pretty complex, pretty quickly. And as usual, higher complexity generally means that we are more likely to overfit. So we want to be aware of this and be sure that whatever is done is going to generalize well outside the dataset. So, again, we need to ensure that we have the proper holdout sets. And finally, as we hinted at, that final meta classifier does not necessarily need to just be a vote, or a weighted vote, but it can even be its own model. We could run a linear regression at that last step, or logistic regressions, or vector machines, or random forests, using the output of each of these base learners as the input into that final learner. So how do we do this in practice? First, as usual, we want to import our class containing our classification method, here being the VotingClassifier. We're then going to create an instance of our class and pass in our hyper parameters. We see here this estimator list, that should be an actual list of fitted models. So we should have already learned all of our different hyper parameters as well as all of our different parameters on a certain training set to come up with our different models, that logistic regression, the random forests and so on, that we're going to combine using this VotingClassifier. Another thing to note, another hyper parameter that's available is going to be, how do you want to vote? Now, so far we've discussed voting as hard voting. Which is just taking whatever the output of the class is, so if we have random forests and we have logistic regression, and those both predicts one, then we would vote one. Rather than just taking ones or zeros, we can also output probabilities from any of these models. And if we do that, rather than just taking a vote, we can take the average of each of those possibilities, and that would be called soft voting. And then if we have a certain model that's very certain that it should be either zero or one class, we allow that to have more weight. So that's going to be the soft voting. And the way that we pass that in, that hyper parameter's called voting, and we can either pass soft or hard, and we'll see that in our notebook. We're actually going to use soft voting. We're then going to fit the instance, as we've done before, so all we have to do is call VC.fit on our training set, and then predict on our test set. And then as usual, there's a regression option. All we have to do is use VotingRegressor if we want to use regression. And then another important note is that the StackingClassifier, which is actually somewhat newer to scikit-learn will work similarly, but will allow us to have a different final classifier besides voting in order to come up with that final prediction. So in order to do that, we pass in our estimator_list. And as you see here, we also pass in a hyper parameter, the final_estimator, and by default it's actually LogisticRegression. But you can pass in any type of model without any fit in order to decide how we're going to end up coming up with that final prediction. Now let's recap what we learned here in this section. In this section, we discussed the boosting approach to combining models, with the main idea with boosting being to build off of prior mistakes of our weak learners. We then discussed the types of boosting models, namely gradient boosting and AdaBoost. And how AdaBoost was one of the first boosting methods used, but can be easily skewed by outliers. And this became clear when we looked at the different loss functions used for each of the boosting methods. And, finally, we showed how we can combine heterogeneous classifiers using our stacking method, which took many different classifier outputs, and it aggravated them by using them as input to one final classification method. Something to note in regards to further reading, XGBoost is another popular boosting algorithm that's not in scikit-learn but has its own library. And for more information, I would say read the documents that we have here. And it's going to be essentially gradient boosting, but with a bit of parallelization, which will allow you to speed up how fast you're able to fit your model. And as we'll see in the notebook, which is going to be our next video, that may take a very long time. So it may be worth looking into XGBoost as well. All right, with that, we'll move into our notebook. I'll see you there.