Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

En provenance du cours de University of Houston System

Math behind Moneyball

43 notes

Vous trouverez chez Coursera les meilleures vidéos de cours du monde. Voici quelques-unes de nos recommandations personnalisées pour vous

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

À partir de la leçon

Module 3

You will learn how Monte Carlo simulation works and how it can be used to evaluate a baseball team’s offense and the famous DEFLATEGATE controversy.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

So we all know about the famous deflategate, maybe you think the Patriots

cheated, maybe you don't and I don't think we really know for sure.

But of course, the idea is Tom Brady and/or the rest of the Patriots

deflated the balls, we don't know for how long, but perhaps in that Colts Patriots

game, which they didn't need to do, they slaughtered the Colts, you know.

And the idea was underinflating the balls would make it easier for them to catch,

I'm not sure how to prove that, but it would make it less likely you'd fumble

the ball and that's what I want to address using resampling in this video.

Okay, so Brian Burke, who has a great site which we'll talk about a lot later in this

class, and I just love Brian and his site, he's done so much work.

Advancedfootballanalytics.com did a post on deflategate,

so he looked up basically the last five seasons,

2010-2014, for all the teams that don't play in a domed stadium.

Why omit domed stadium teams?

because you'll fumble less in a dome cause conditions are always perfect in a dome.

How many plays per fumble?

So for example, in 2010, the Patriots went 89 plays between fumbles, on average.

And over the five year period they averaged 78 plays between fumbles,

which is far more than anybody else.

The Ravens were second, 58, and if you're a Redskin fan you're not surprised to find

out the Redskins only did 37 plays fumble, which is less than anybody else.

Okay, it's like being a Cubs fan, although the Cubs are good this year, and

the Astros who I like are also pretty good this year and

they're both heavy into analytics which is nice.

Okay, so

first let's use conditional formatting color scales to see just a visualization

of how unusual the Patriots' lack of fumbling is over this five year period.

So I did Home > Conditional Formatting > Color Scales,

I'm in love with yellow, green, red.

Okay, so you see you remember the darkest green means it's a big number,

the darkest red means it's a small number.

So four of those five years are really dark green for the Patriots,

very unusual, and one is sort of in the middle.

Okay, and the Redskins have a lot of red there so

why they fumble a lot I don't know except they've just been a bad team.

So how could we use resampling to try and

figure out how unusual the Patriots' lack of fumbling is?

We're not going to say what caused it, but it's extremely unusual.

So we've got how many numbers here?

We've got 23 rows, I think.

So I can select the one and two, double-click and it copies down.

So I've got 23 rows, to match the column to the right and I've got 5 columns.

I've got 115 numbers, so couldn't I use resampling to randomly generate five

years of fumbling saying each year is equally likely to be one of these 115

numbers and basically find the average number of plays per fumble.

Okay, and then see what's the chance it would be greater or equal to 77.6.

Okay, so what we'll see in a minute is if there's like almost

no chance if you randomly pick five of these 115 numbers with replacement,

that you'd get a mean of at least 77.6, it's like one in 10,000.

So that would indicate there's really very small, the Patriot's fumble a lot less,

significantly less, than an average NFL team that doesn't play in a dome.

Now what caused this?

I don't know, is Bill Belichick the fumble whisperer, he can avoid fumbles?

Maybe, maybe not.

I know he should dress better, however.

Okay, so now let's find the row for the fumbles.

I'll use a randbetween one to 23 and with replacement,

and the column for the fumbles, there five years, and

then I can use index to randomly pick out one of the numbers.

Do randbetween one through five, right there and I can then use index.

I believe I named this range, what did I name it,

data, where all this stuff is, but I'll select it.

So the fumbles per play would be equals index and let's select the range.

And I've got the row and column, and then we'll verify this is working okay.

So that was called data and the row number comes from the first randbetween,

gives everything the same chance and the column number from the second randbetween.

And so let's copy this down and let's just verify it's picking

out the right number, always a good idea to check your formula.

Okay, so row 10, column 2, is data 37.

Row 10, column 2, Jacksonville, they didn't fumble, they fumbled a lot that

year, did a lot of other things wrong too, good luck to Jacksonville in the future.

Okay, so now I can take this resampled mean fumbles per play and

do it 10,000 times, let's say and see how often we get more than 77.6.

So I hit F9 a couple of times, it's just very unlikely, you'll see in a minute,

that I'll get at least 77.6 which is what the Patriots did.

So let's do 10,000 iterations, as they're called,

playing this spreadsheet out 10,000 times.

So I'll do home, I'll do fill, I'll do series and

I'll change it to columns and I'll say one through 10,000,

say okay and the resampled mean here is this.

And so let's play it out 10,000 times.

So I go Data > What if analysis > data table > Column input cell.

Okay now let's average those 10,000.

They're all the same, why?

because I did automatically except for tables,

remember we said the data table won't we calculate, easy to fix, hit F9.

Okay so they change.

So what fraction in time will we resample five randomly

chosen numbers with replacement from those NFL fumbles per play do we beat 77.6?

So I'd say this many plays, then I'm saying greater than or

equal to 77.6, quotes because it's text, and I divide by 10,000.

And it just doesn't happen very often.

Here it's two in 10,000, there it's two in 10,000,

often it's one in ten, never happened.

Derek's doing 10,000, so let's say one or

two chances in 10,000.

So the Patriots' lack of fumbling during those years.

If you assume they're like another,

an average NFL team that didn't play in the dome,

had around one chance in 10,000 of occurring.

Well that's a pretty high level of proof that the Patriots fumbled a lot less than

you would expect if they were an average NFL team that didn't play in a dome.

And so why did this happen?

I really don't know, I'm not going to claim to have the answer, but

it certainly goes along with the I guess prevailing public

view that the Patriots did something to the footballs and it actually seems to

indicate this may go back a lot longer than that Colts Patriots game.

Okay, so that'll finish module four.

We'll have a homework problem and

a test question on resampling, and then, module three, I'm sorry.

And then in module four we'll return to baseball and do some lot of interesting

things involving baseball, so we'll see you in the next video.

Coursera propose un accès universel à la meilleure formation au monde,
en partenariat avec des universités et des organisations du plus haut niveau, pour proposer des cours en ligne.