Welcome to question number 3 in our notebook. Here we're going to be using gradient boosted models. And the goal of this question is to loop through different numbers of trees and see how our accuracy tends to increase or decrease as we increase that number of trees. Now, some things to note is we set max features equal to 5. If you recall just from last video, we have 561 different features available to us. If we let it run on all 561 features, this will take very, very long to actually fit our model. So this is a way to just speed things up. The accuracy tends to be around the same for this given model, that's not always the case. If you want to test it on your own, note that this could take again, 25 to 30 minutes in order to actually run, if you set max features equal to none and allow it to run through every single feature. But here we set it to 5 features, just to ensure that it runs a bit quicker. The other thing to note is that when we did bagging in our last notebook, we were able to use something like this warm flag so that we didn't have to keep reinitiating or keep relearning the trees that we learned before. As we increase that number of trees, there's a bit of a bug in the boosted models, so we won't be able to do that as well. Also, there's no out of bag error, since we're training on the full data set when we do any type of boosting compared to bagging. Remember with bagging, we have that bootstrapping, whereas with boosting, we're just trying to fix on past errors. And then also because this is done successively, because we're fitting each one of these trees successively, you have to note that this will take a longer time than bagging, because we can't fit the next tree until we fit the tree before, and so on and so forth. So the first thing that we want to do is import our gradient boosting classifier. We're then also going to import our accuracy score. What we want to measure here is going to be our error rate, which will just be 1 minus the accuracy score. We're then going to set our tree lists, which is just 15 through 400 with each one of these different numbers, and we're going to want to for loop through each one of these different possible number of trees that we're going to run through. So each step in our for loop, we initiate our gradient boosting classifier. Again, we set max features equal to 5 to ensure that doesn't run for too long. We set our number of estimators equal to this entries variable, which will just be wherever we are within our list that we have here in our tree list. And we set our random save equal to 42. We're then just going to print out at each step along the for loop, where we are in the for loop, so we're fitting model for entries for that number of trees. We're then going to fit our initiated model with that number of trees on our training set, so on x train and y train. And then we are going to come up with a prediction on our test set using GBC.predict. So then once we have our predictions, we can use our accuracy score, pass in our test values, the actual values, as well as our predicted values. One minus that accuracy score will be our error. We'll append that to this error list that we initiated at the top before our for loop, so that we'll have, at each step append a series with the number of trees as well as the actual error. And then we can concatenate those all together, to create our error data frame that we'll see, and as we've seen in past notebooks as well. So I'm going to run this, and this'll take about maybe a minute to run here. As you see, it's moving through each one of these different options. And it did it fairly quickly for 15, 25, 50 trees. And you see that's slowing down as it has to fit more and more trees. So I'm going to pause the video here, and we will come back as soon as it's done running through each one of these trees. So hopefully that didn't take you guys too long, didn't take too long on my end. And then once we ran through each one of those different numbers of trees, we were able to output that error data frame, which gives us as our index that number of trees. And then for the values for that error, we saw the actual error rate as we increase the number of trees. And we see here as we increase the number of trees, we keep decreasing that error rate. And as we get towards higher numbers of trees, we see this diminishing returns as we increase those numbers of trees. So, we're then going to plot out this error dataframe. So we call error_df, which is going to be our data frame.dot plots, so are using the pandas functionality for plotting. We set our marker to just be that circle. We set our fake size to 12 by 8. And we're going to connect it with a line and we're setting that like with equal to 5. We're then going to set our x label to number of trees, and that's going to reference the number of trees in that index here and the y label will be the actual error. And then we're going to set our x limits from 0 to the maximum of the error_df index, the index again is just this number of trees. So the maximum value will be 400 times 1.1, will be 440. So we're setting it for values between 0 and 440 on that x axis. And when we plot it, we see this diminishing error. As we continue to increase the number of trees and we see that somewhat plateau from 200 to 400, there's a slight increase at 400, but probably will moving forward. Now for question number 4, we're going to use a grid search with cross validation to fit a new gradient boosted classifier. Here we're going to be setting the number of trees equal to 400, so we're not going to loop through that. But in practice, you may want to actually loop through these number of estimators again. If you recall during lecture, we discussed how there's a relationship between the learning rate and the number of trees. And if you have a lower learning rate, you probably want more trees. And perhaps you'd be able to come up with a better model, but it will take a very long time to run. So we're keeping it at 400 just to ensure that it won't take too long as we run. This will take quite some time still, given the parameters that we're using. We're then going to try bearing the learning rates, the sub-sampling values. So recall that the sub-sampling value is just going to be how much of that training set we're actually going to be learning on for each one of the trees. If we say only 50%, then rather than using the full training set, we're using a portion of it and this adds a bit of regularization to our model. And then we're going to use different number of maximum features, with the lower numbers meaning more regularization. And the more features we allow, the more complex that model can be. We're then going to examine the best parameters for the best fit model. So what are the parameters that were output? We're going to calculate the relevant error metrics on this model, so all those error metrics that we discussed in past lectures. And again, I want to note that this will take some time to run. So, as before, I'm going to say let it run, and then I'm going to pause, but I will also show you so that you don't have to keep rerunning this code if you lose your model or your kernel shuts down. How to actually save that model and reuse it later on or bring it back into your memory later on. So, in order to create our model, first we import GridSearchCV. We're then going to set a parameter grid. We're setting our learning rate to values looping through 0.1, 0.01 and 0.001. Our sub-sample are 1 and 0.5, and our max features are 2, 3 and 4. We're then going to pass in our actual classifier. And here we see that that's going to be gradient boosting classifier with the default of the number of estimators equal to 400. So, no matter what we loop through for each one of these parameters, we're keeping that number of estimators at 400. We're then setting our param_grid to the param_grid. We're trying to optimize on accuracy and jobs. Again, that's just saying how much we can run and recall that with boosting compared to bagging. We're not going to be able to paralyze out models as much, so again, this will take some time to run So now that we initiate this object, we can fit it on our training set. So we're just going to run this here and this will take some time to run. So I'll let it run and I'll see you back here when it's done running All right, now that may have taken quite some time to run. I don't know if ten minutes or so. Now, to ensure that you don't have to keep running this, if you mess up at some point or if you restart your kernel, something that you can do is actually save that Python object. So I'm going to import something called pickle. And what we're doing here is something called pickling, which will serialize are Python objects, save them as bytes, so that they're saved in a file. And then if we want to reintroduce them as a Python object, we can just pull them back out using this pickle functionality. So the way that it works is we can do pickle.dump and that's just dumping our Python objects into pickle file. So we want to save this GV_GBC object that we just fit, GBC, and then we just want to open up a file. Now, this will just create a new file if it doesn't exist. I'll call this gv_gbc.p, is a pickle file. And then we will just say whenever you call open in Python, you have to say what you want to do with that file. Here we want to write that file, so that's going to be a w, and then we want to write it as bytes. So we'll just say wb. And now we've saved this object as gv_gbc.p. And if later on or another notebook you want to load this up, assuming you're in the right directory, this actually saves it in the directory we're working on. And we see here that our current working directory is going to be this data directory. What we'd want to do is actually called pickle.load and just say the file that we want to load, which is gv_gvc.p and then we'll just say read bytes. And then we'd be able to load that in, we'd probably want to save that as an object, this would be the same as the gv_gbc object that we just created. So we can just say gv_gbc = pickle.load. And then we would load it back up. So now let's see what our best estimator was for that model. We see that it shows max features equal to 4. The number of estimators we already had set to 400. It actually took a sub-sample. So sub-sampling, which introduced a term of regularization actually helped in terms of our model. And then for the learning_rates, which is the we can just call the learning_rate and see that it shows 0.1. We're then going to use that best estimator to predict on our x test and see the classification report across each one of our different values that we could be predicting here. And see what our precision recall so on and so forth, all are. So you run this, and we see really high scores across the board. So for the 0 class, we've got everything perfectly. For 1 maybe not as wel, but still 96, 97%, and we see that across each of these classes, these very high values for precision, recall and f1 score. And then when we look at the overall accuracy, we see we had 99% accuracy. And then we have that macro average and weighted average, which here doesn't mean as much since the support is pretty evenly distributed. We're then going to check our confusion matrix to see for those that we got wrong, where did we go wrong? And those should be somewhere around these classes of 1 and 2. So we run this and we see for class 1, we're getting that confused with class 2, and class 2 confused with class 1. As we see here with the 15 and 18 alongside the diagonal, that means that they all were classified correctly. So it seems that the classes 1 and 2 seem to be getting mixed up. And if we go back up here, we see that those are going to be sitting and standing or getting mixed up. So if you have an Apple Watch again, that it's telling you to stand and may be confused between that sitting and standing. Now finally, we're going to do the same thing for AdaBoost And we're going to then compare the errors from AdaBoost and this gradient boosted classifier we just learned. And again note, this will take a long time to run, so I'm going to run this here. We're going to look through different estimators and different learning rates. So that's all that we're going to loop through here, it will still take quite some time. And as we discussed, if you want to change that max depth of your AdaBoost classifier, this is actually the default value. You could have also passed in different max depths here. But we're just going to set max_depth equal to 1. So we're using a decision stump, and then we're using the param_grid, everything the same as what we just did with gradient boosting. All right, so I'm going to run this and fit this on our training set. So now we're fitting our AdaBoost classifier. And again, this will take some time to run. So I'll pause the video here and then come right back as soon as it's done running. So that may have take some time to fit, something along ten or maybe even a little bit more minutes in regards to fitting this model. Again, I would suggest pickling using that pickle.dump in order to save this file. I've already done that. We're then going to see the best_estimator and see that we have the learning rate of 0.01 and the number of estimators equal to 100 being our best hyper parameters here. We can then create our prediction using that fitted model on our X_test, on our test set, and then see that classification report for our predictions in our Y_test. And we see here that we have a much reduced recall, f1-score, and accuracy score compared to what we had for gradient boosting. We see that for class 2,were at poorly with recall, with the class 1 precision and f1 score at 0. And if we look at the confusion matrix, we see that we are getting that class 1 wrong and keep misclassifying that as a 2, and then we see also with the 3s and 5s are starting to get some of those wrong as well. And this can be due to the fact that AdaBoost can be greatly skewed by outliers, as we discussed with the loss function use for AdaBoost. So in this example, it's not always the case, gradient boosting will perform better and this is why we check across different models. Now, that's going to close out this section in regards to the different boosting methods. In the next video, we'll start to discuss the stacking method using the voting classifier and I will see you there.