Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

From the course by University of Houston System

Math behind Moneyball

31 ratings

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

From the lesson

Module 1

You will learn how to predict a team’s won loss record from the number of runs, points, or goals scored by a team and its opponents. Then we will introduce you to multiple regression and show how multiple regression is used to evaluate baseball hitters. Excel data tables, VLOOKUP, MATCH, and INDEX functions will be discussed.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay. Let's apply what we've

learned about linear regression to evaluating baseball hitters.

So, in the following linear weights, we've got for

the years 2000 through 2006, we got the run score by each team.

The singles, doubles, triples, homers, walk plus hit by pitchers, stolen bases,

and caught stealing.

So, it stands to reason that we would know the red columns,

which is sort of what the team's batting offense did.

We should be able to predict the yellow column.

So we could run a regression, we know how to do that.

We go data analysis, y range is the runs column,

x range is the red columns, and the results we get

are right here in the stolen base and caught stealing worksheet.

Okay, so remember we look at P values.

Anything with a low P value less than 0.1 is significant.

In other words a useful predictor.

Anything higher,

P value higher than 0.1 is not useful given what's in the equation.

So stolen bases have a 0.43 P value, which means there's a 43% chance they don't help

us after knowing the other variables in the equation.

Flex only has a P value of 0.95, which means it doesn't help us.

So we should run the model again without those independent variables.

So we would run the model simply using in these columns.

So in other words, let's make this go black here.

So we would use the orange columns, excuse me, I'm going to sneeze.

[NOISE] Sorry, we're going to use the orange column

to predict the yellow column.

Okay, let's see what we get.

[NOISE] Okay, so

what we get is We go here,

where we've thrown out the no outs, caught stealing and stolen base worksheet.

We get the results of the regression.

So the R Squared is 91%.

We explain 91%

of variation in runs scored.

The standard error is 24,

Which means 95%, 24.4.

95% of our forecast,

Should be accurate within 49 Couple of that.

Anything else is an outline.

And our predicted run score which will be using this equation a lot.

If you look at what's in yellow here, would be -560 Plus 0.63,

and these are called linear weights.

In other words, a single is worth 0.63 runs.

A double is worth 0.71 runs.

And a triple, 1.26 runs.

And a home run should be worth more than one run.

And then a walk plus 0.35 times walks plus hit by pitcher.

Okay, so you could use this equation to predict how many runs a team will score.

These are again called linear weights.

And we can use these given hitter statistics that we will see

in a couple of videos to predict how good, to evaluate how good a hitter is.

In other words, how many runs we would expect in a team of maybe 9 Barry Bonds

from 2004 score will be an example that we use.

So now, that's a minus sign, I just want to make that clear.

Okay.

So now, how accurate are the linear weights?

Remember, runs created were up by an average of 28 runs.

But if we look at the linear weights model,

here I put the linear weights there, and then I, okay, I made a predicted runs.

Okay you take the inner set, which is minus 560 and

then you multiply the linear weights times the statistics for the team,

you're off by an average of 19 runs per game.

Okay now the runs graded was off by 28 runs per game so

this is a bit more accurate.

Okay and just a little rule of thumb here if you have a standard error of

the regression.

If you take, this will usually be true,

if you take about 80% of the standard error of regression,

You come close to the mean absolute deviation.

Okay.

So if you saw the movie Moneyball you probably remember they don't talk about

the linear weights for singles, doubles, triples, and homers.

They talk about OBPS, On Base Plus Slugging, and

that's on the back of the baseball card.

And so in the next video we'll learn how to use multiple linear regression

to try and sort of derive the importance of OBPS and rating hitters.

And we'll see that it's really an oversimplification.

Really on base options on base+ percentage.

We'll see that on base percentages more important than percentage and

progression shows it's almost twice as important.

On base percentages, slugging percentage.

And we'll see that in the next video.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.