Welcome back for Part 3 of our notebook here on bagging. In this section, we're going to show how using random forests we can increase the number of trees, see the error decrease as we increase that number of trees up to a certain point, and then that error will eventually plateau, as we discussed in the lecture. Where it plateaus, we can say that is enough trees to fit for our optimal model. Now something to note, is since the only thing changing for our model is the number of trees, this warm start argument can be used so that the model just adds more trees to that existing model each time. We're going to use the set_params method to update the number of trees on that initialized classifier. We'll see that in just a second. So we first import our random forest classifier from sklearn.ensemble. We then initiate our object and we want to be getting our out-of-bag score, that's that oob score. So we set that equal to true. If we recall, that out-of-bag score is for each one of our decision trees. This set data was not trained, I'm checking that error or setting a random state. We're going to have that warm start, as we just discussed, equal to true and cyst random forests will be running things in parallel. We set the number of jobs equal to the maximum value by just setting it equal to negative one, as discussed in prior notebooks. So we want to track each one of our out-of-bag errors. So we're going to create an empty list here, and then we're going to loop through each one of these numbers of trees. So we're going to start with 15 then 20, 30, so on through each one of these numbers in our list here up until 400 trees, to see where it plateaus as we increase the number of trees. So we take our initialized object RF and we call set_params as we just discussed. The only thing that we continually update is this number of estimators, and we set that equal to wherever we are in our for loop, for that number of trees. We can then fit it on our training set. We can then get our out-of-bag error by saying one minus that out-of-bag score. That's going to be an attribute available, only if you set out-of-bag score equal to true, which we have. Then we're just going to store that, so we append that onto our empty list, and that's just going to be a series that has our number of trees as one of its indices and then the out-of-bag error for the other one of its indices. Then finally, we will concatenate all of that into a DataFrame and set the index equal to number of trees. So we're going to run this and it'll take just a second to run. It should output as we see here, a Pandas DataFrame with the correlation, the respective out-of-bag error for each one of the number of trees. So we see 15, then 20, so on and so forth. We see that it's gradually decreasing, and then at around 100 to a 150, it seems to plateau. We're also going to look at a graph. Here we're just going to call that DataFrame call it.plot, and we're going to set marker equal to o. We're actually going to connect with the line with equal to five. We're going to set our y label to out-of-bag error, and when we run this, we can see that drop off in error, and then it's really at its low at around 100 and it seems to somewhat go down. Something I do want to point out, is be careful whenever you are creating a plot. This plot goes from 0.048 to 0.056, so there's a really tiny difference, and each one of these differences is 0.002. So there's really a minimal difference between each one of these errors once we get down to this lower point. With that juncture, you want to keep in mind that each one of these raw numbers. So these were really relatively close at 100, 150, 200 and so on. Now we want to use that same practice for our ExtraTreesClassifier that we introduced in the lecture as well. So here it's going to be all the same steps. Something that we need to do is set that bootstrap argument equal to true. The reason why we do that, is we won't be able to get that out-of-bag error unless we're bootstrapping our model. That bootstrap means that we're taking a sample and that out-of-sample is going to be that out-of-bag. In general, the default is going to be bootstrap equals false for ExtraTreesClassifier, and then it will fit on the entire dataset. That will be allowed because this ExtraTreesClassifier is more about coming up with random splits than anything else. Then again, we set warm start equal to true, out-of-bag score equal to true. Again, that will only work if the bootstrap is true. Then the number of jobs is just as many jobs as it can run in parallel, given your computer. Well then again, going through that same syntax of setting an empty list of our out-of-bag errors. We run a for loop through each one of these different numbers of trees. We set our parameters, number of estimators equals to number of trees. Fit it on our x train and y train. We get our new out-of-bag error by one minus EF.oob_ score. Then we append on that Panda Series with the number of trees and its respective out-of-bag error. Then again, in the same fashion, we dump that into a DataFrame. So we run that and we get again, our different errors. We're going to combine each one of these two DataFrames together so we can plot them together. So all we have to call is pd.concat, put them each in a list and say we want to concatenate on the columns. When we run this, we see that we have our random forest column and our extra trees column with their respective errors. Then when we plot these out, we see for this specific example, and this isn't always the case, but it is generally most often the case that random force will perform a bit better. We see that random forest does perform better across each one of the number of estimators with that line of the error consistently below that of extra trees. So that closes out this section of walking through both the random forests and extra trees as well, seeing that plateau number. With that, we're going to move into selecting one of those better models and looking at each one of the error metrics for that model. All right, I'll see you in the next video.