Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

En provenance du cours de University of Houston System

Math behind Moneyball

38 notes

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

À partir de la leçon

Module 1

You will learn how to predict a team’s won loss record from the number of runs, points, or goals scored by a team and its opponents. Then we will introduce you to multiple regression and show how multiple regression is used to evaluate baseball hitters. Excel data tables, VLOOKUP, MATCH, and INDEX functions will be discussed.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay, in this video, we'll introduce our study of evaluating baseball hitters.

And we'll start with the seminal concept of runs created,

which was developed by Bill James, who is considered the father of sabermetrics,

which is the math of baseball.

And why is it called SABR metrics?

Society for American Baseball Research is SABR, so

people who study the math of baseball are often called SABR metricians.

Now Bill James is often given the credit for sort of being the father of

baseball mathematical analysis, and he did a lot of great stuff, as we'll see.

On a little bit of background on Bill James.

There's a video on 60 Minutes that'll tell you a lot about BIll James.

And I gave you the URL on YouTube in basically the notes for this video.

But some other people deserve as well.

Bill James was a night watchman in Lawrence, Kansas.

He attended University of Kansas.

And basically there wasn't much crime at the Stokely Van Camp factory

where he worked as the night watchman, so he looked at baseball statistics.

And I don't think he knew a ton of advanced statistics, but

it doesn't matter because he was a genius at looking at problems in

a different way than people looked at for over a hundred years in studying baseball.

And we'll see his genius,

particularly when we start talking about evaluating major league fielders.

And basically, the concept we'll talk about in this video is

runs created which you can see doesn't use any advanced math, but

it's an attempt to try to understand how man runs does a hitter create?

For example, a singles hitter who hits .380 and

walks a fair amount, is that a better hitter than a power hitter who hits .250,

hits 50 home runs, and walks 100 times a year?

And the answer's just not obvious, but before we get to the runs created concept,

let's look at some other important figures in the history of sabermetrics.

Well, George Lindsay is one of my heroes, because I got a PhD in operations

research and George Lindsay published an important article in the Journal of

Operations Research in 1963, An Investigation into Strategies of Baseball.

And he figured out there, for example,

that bunting probably wasn't a great idea and gave a lot of other great analysis.

That was followed by Earnshaw Cook's book, Percentage Baseball,

featured in Sports Illustrated.

I was a kid then and I loved math and loved baseball, and so

I read that book cover to cover.

He had many interesting conclusions such as,

you should bat your best hitter lead off,

instead of batting him cleanup, or third, which is what most managers do today.

And then Pete Palmer and John Thorn wrote The Hitting Game of Baseball, and

their fist edition, I believe, was 1985, just revised in 2015,

which is a terrific book with lots of great insights.

And they also have a book on the hidden game of football,

which we'll discuss a little bit when we talk about expected points in football.

But lets get to the idea of runs created.

So you want to know, basically given a hitters statistics,

is one hitter better than another?

Well, this is hard because if you look at like a team of Prince Fielders.

We'll look at Prince Fielder in the next video,

versus a team of Roger Hornsbys who was a great hitter in the 1920s.

You don't really know how many runs a team with nine of those guys would have scored

because, basically, there's no team with nine of those guys.

So all you have are team statistics really to get insights into what's going on.

So I put together from 2002 through 2006, how many runs did each team score.

In column B, at bats, hits, singles, doubles, triples,

home runs, and walks, and hit by pitcher.

And basically, the orange columns are good things.

If you do more of those in a season, your team should score more runs obviously.

And so there should be a way to use the orange columns to predict the yellow.

Now those of you who know what multiple regression is probably would think of that

instantly and we'll talk about multiple regression starting in the next video, but

I don't think Bill James knew multiple regression or

he would have used it on this data.

But he came up with something really quite brink.

If you want to figure out how many runs a team would score,

he came up with the following basic formula, runs created.

There are more complex versions which are on Wikipedia, and

you're welcome to look at those, but this course is not a Ph.D.

in runs created, so we'll just look at his first simple formula which actually does

a pretty good job.

Principle of person learning.

I mean you want the simplest formula to get you most of where you want to go in

most cases in business.

Okay, so basically Bill James said, to score runs you gotta get on base,

you gotta advance the runners.

So you take the number of times you get on base, hits plus walks plus hit by pitcher.

And then advancing the runners, a surrogate for

that would be the total bases which is singles plus twice doubles,

plus three times triples, plus four times home runs.

Then you gotta divide it by the number of opportunities you had, how many times

you came up to the plate, which is roughly at bats, plus walks, plus hit by pitcher.

And so basically, Bill James, I don't know how he came up with this,

but basically it's a logical formula.

And he said this'll closely predict how many runs a team scores in a season.

And if we throw it down to the player level,

we can rank players based on runs created.

And this is really brilliant to

come up with this formula because it works really well.

So how does it work on the data we've seen?

Okay, we're going to put these formulas back in it.

And so I'll take the map.

So, we look at runs created.

We take D4 plus J4.

J4 is walks.

D4 is hits.

So I've got hits plus walks, that's the first part.

Hits plus walks plus hit by pitcher, sorry.

And then I would take singles plus two doubles,

plus three triples plus four times home runs.

That's the total bases.

Divide by column C, which is hit backs,

plus column J which is walks plus hit by pitcher and that's runs created.

And how accurate is this?

Well, we'll look at the absolute error like we did in

our Pythagorean Theorem video.

Take absent value of actual runs minus runs created.

This is the steroid era so the teams average about 175 runs per season.

What is that, about 4.8 runs per game.

So I drill down, copy that down.

Now the way I copy that down, again the double-click trick.

And if I want to average those absent deviations, I type average.

Double-click on average, Excel understands I want to complete that.

And now to select this coumn, I start here and

I could do Ctrl+Shift+Down Arrow right at the bottom.

Okay, and so I'm off by 28 runs per game out of 775 and

that's less than 4% so that's pretty accurate.

I mean it's hard to get much better than that no matter how hard you try.

But you can get better than that.

Okay, so runs created though,

the problem with it is it doesn't take into account of scarce resources you use.

The scarce resource in baseball is an out, okay?

So if you use up outs, you want to consider that.

Like in football, scarce resource is downs.

In basketball scarce resource is possession of the ball.

So what we would like to do is, in the next video,

adjust runs created to be runs created per game.

And there's roughly 27 outs per game, so

basically if you created 108 outs you've sort of wasted 4 games of outs.

And so if you take your runs created and divide by the number of games of outs

you've wasted, you get a much better idea of how good a hitter is, and

we'll look at that in the next video.

Coursera propose un accès universel à la meilleure formation au monde,
en partenariat avec des universités et des organisations du plus haut niveau, pour proposer des cours en ligne.