Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

From the course by University of Houston System

Math behind Moneyball

24 ratings

University of Houston System

24 ratings

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

From the lesson

Module 1

You will learn how to predict a team’s won loss record from the number of runs, points, or goals scored by a team and its opponents. Then we will introduce you to multiple regression and show how multiple regression is used to evaluate baseball hitters. Excel data tables, VLOOKUP, MATCH, and INDEX functions will be discussed.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

So, in the following linear weights, we've got for the years 2000 through 2006, we got the run score by each team. The singles, doubles, triples, homers, walk plus hit by pitchers, stolen bases, and caught stealing. So, it stands to reason that we would know the red columns, which is sort of what the team's batting offense did. We should be able to predict the yellow column. So we could run a regression, we know how to do that. We go data analysis, y range is the runs column, x range is the red columns, and the results we get

are right here in the stolen base and caught stealing worksheet. Okay, so remember we look at P values. Anything with a low P value less than 0.1 is significant. In other words a useful predictor. Anything higher, P value higher than 0.1 is not useful given what's in the equation. So stolen bases have a 0.43 P value, which means there's a 43% chance they don't help us after knowing the other variables in the equation. Flex only has a P value of 0.95, which means it doesn't help us. So we should run the model again without those independent variables.

So we would use the orange columns, excuse me, I'm going to sneeze. [NOISE] Sorry, we're going to use the orange column to predict the yellow column. Okay, let's see what we get. [NOISE] Okay, so what we get is We go here, where we've thrown out the no outs, caught stealing and stolen base worksheet. We get the results of the regression. So the R Squared is 91%.

Should be accurate within 49 Couple of that. Anything else is an outline. And our predicted run score which will be using this equation a lot.

If you look at what's in yellow here, would be -560 Plus 0.63, and these are called linear weights. In other words, a single is worth 0.63 runs.

Okay, so you could use this equation to predict how many runs a team will score. These are again called linear weights. And we can use these given hitter statistics that we will see in a couple of videos to predict how good, to evaluate how good a hitter is. In other words, how many runs we would expect in a team of maybe 9 Barry Bonds from 2004 score will be an example that we use. So now, that's a minus sign, I just want to make that clear.

Okay. So now, how accurate are the linear weights? Remember, runs created were up by an average of 28 runs. But if we look at the linear weights model, here I put the linear weights there, and then I, okay, I made a predicted runs. Okay you take the inner set, which is minus 560 and then you multiply the linear weights times the statistics for the team, you're off by an average of 19 runs per game.

Okay now the runs graded was off by 28 runs per game so this is a bit more accurate. Okay and just a little rule of thumb here if you have a standard error of the regression.

Okay. So if you saw the movie Moneyball you probably remember they don't talk about the linear weights for singles, doubles, triples, and homers. They talk about OBPS, On Base Plus Slugging, and that's on the back of the baseball card. And so in the next video we'll learn how to use multiple linear regression to try and sort of derive the importance of OBPS and rating hitters. And we'll see that it's really an oversimplification. Really on base options on base+ percentage. We'll see that on base percentages more important than percentage and progression shows it's almost twice as important.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.