Alternatively, you can actually build your own bagging function in caret.
This is a bit of an advanced use and so I recommend that you
read the documentation carefully if you're going to be trying to do that yourself.
The idea here though is you basically are going to
take your predictor variable and put it into one data frame.
So I'm going to make the predictors be a data frame that contains the ozone data.
Then you have your outcome variable.
Here's it's going to be just a temperature variable from the data set.
And I pass this to the bag function in caret package.
So I tell it, I want to use the predictors
from that data frame, this is my outcome, this
is the number of replications with the number of
sub samples I'd like to take from the data set.
And then bagControl tells me something about how I'm going to fit the model.
So fit is the function that's going to be applied to fit the model every time.
This could be a call to the train function in the caret package.
Predict is a the way that given a particular
model fit, that we'll be able to predict new values.
So this could be, for example, a call to the predict function from a trained model.
And then aggregate is the way that we'll put the var, the predictions together.
So for example it could average the
predictions across all the different replicated samples.
You can see that if you look at this
custom bag version of the conditional regression trees, you can
see that it gets some of the benefit that I
was showing you in the previous slide with bag loess.
So the idea here is I'm plotting ozone
again on the x-axis versus temperature on the y-axis.
The little grey dots represent actual observed values.
The red dots represent the fit from a single conditional regression tree.
And so you can see that for example, it capture, it doesn't capture the
trend that's going on down here very well, the red line is just flat.
Even though there appears to be a trend upward in the data points here.
But when I average over ten different bagged
model model fits with these conditional regression trees.
I see that there's an increase here in the values in
the blue fit, which is the fit from the bagged regression.