0:06
Let's see how we can use grid search in H2O.
I'm going to be using the airline's data set that we saw earlier in this week.
And I'm going to be using GLM.
It is one of the quicker algorithms,
so it's perfect for learning grids with.
It will be a logistic regression.
Brace yourself. Question incoming.
0:37
Okay. So, I've already run the highlighted code on the left,
and we see the description of the data fields here.
And just as before, we specify our Y and then,
our X as all of the columns except the thing we want to learn.
It's marked forehead slap.
I'm going to give you five seconds to look at this set of
field and see if you can work out why before a question comes up and asks you.
Pause the video if you need to.
1:16
So, as you hopefully realized,
we're not allowed to use any data that we
wouldn't have known at the time we wanted to make the prediction.
The most obvious is arrival delay,
which was used to directly make is arrival delayed.
Also, actual elapsed time wouldn't have been available.
And I've also removed all the ones about departure delay on the assumption,
we wouldn't have had them at the time.
If you wanted to know if a plane after having already taken off was going to arrive late,
then you could validly use these fields,
but I've chosen not to.
I've also made a list,
because I'm a super clever data scientist,
of just the fields I think will be important.
And we're going to come back to that later.
So, first thing, make a baseline model of today because I need lines.
Make a baseline model.
It takes, on this machine,
about 1.2 seconds. 1.4 here.
And we're getting a log loss of 0.623.
385. The MSE is 2.19.
Log loss is what we're going to concentrate on.
So, here's our first grid and it's broken down into three or four blocks.
The very first line is which algorithm, GLM.
Then, a block of search criteria,
whether you want to use Cartesian search, in other words,
comprehensive or combinations search or if you want to just select a few randomly.
That's what we're going to do here. Select eight random models,
add of a hundred hyper parameters.
I've only got a single hyper parameter,
alpha with 100 values.
Just in case, I've set a maximum time limit of 30 seconds.
And I'm going to run this before I talk anymore.
Just in case it takes 30 seconds.
I did try to add a second hyper parameter of the missing values handling.
I thought it would be interesting to compare mean imputation,
which is the default.
That means it will take the average of each column and replace any NAs with that value.
And the alternative is Skip.
Skip gave me an error because it turns out,
there's at least one NA in every row and at least one of the columns.
So we ended up with no data.
4:19
Okay. That finished quite quickly to make the eight models.
Good. The third block of your H2O Grid call is,
you specify grid ID just so you can find the off low later.
Fills to learn from what you want to learn, what you're learning,
binomial logistic regression, training data set,
and this one, we're using Lambda search this time.
So, if we look at the results,
it tried random values from.01 to.72,
and there's a definite pattern.
The higher the value of alpha, the better.
Only small changes in log loss,.592.
Top five values are all.59 something.
So not a huge difference,
but it's distinct and I've seen it on every time I've run this grid.
The default of alpha's.5,
so if we just have a look at this value,.595,
and compare it to our baseline model of.623,
we can see this difference is really quite significant and
that's down to Lambda search.
The other thing I want to look at is,
how well my likely columns will do.
And this seems like a good excuse to try out a cartesian search as well.
Just a matter of changing the search criteria strategy,
you could leave this block off completely as it's the default.
And not specifying too many combinations of hyper parameters.
That's when they're over on the python site though.
So, to do grids in Python,
you need to import H2O grid in addition to importing H2O.
Otherwise, everything else is the same.
You've seen all this code before: import a file,
split the frame, define or ignore fields and exclude them from train names.
There's my likely list.
6:40
Here's our base model,
which again you've seen before,
binomial, validation, and train. All standard stuff.
We get the same.623 that we saw in
R. Here's how you make a random grid.
Things are slightly shuffled around from the R version but the same data there.
So the highlighted code there is just like we did up here,
that it's just like making a normal model. You create an object.
I've had that Lambda search.
Then I'm making over 100 values of alpha and
saying you should choose eight of them
randomly or give up if it can't finish within 30 seconds.
And then, just like with normal models in the python API, you cold train.
So I run that very similar to result,
to what we saw on R. The higher the value of alpha, the better..59 to.62.
So again, it performed our baseline model.
And here's the Cartesian version.
And as before, you could leave this off as it's optional.
And here's how it did. Much worse.
I'm sacked. Very little variation based on alpha.
It's only in the fifth decimal place that you get a difference.
Middle alpha output perfectly balancing L1 and L2 regularisation came out top but by
hardly anything but what this is telling us is significantly better to use
as many fields as we can rather than just the ones that seem sensible. Lesson learned.