Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

En provenance du cours de University of Houston System

Math behind Moneyball

36 notes

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

À partir de la leçon

Module 8

You will learn how to use game results to rate sports teams and set point spreads. Simulation of the NCAA basketball tournament will aid you in filling out your 2016 bracket. Final 4 is in Houston!

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay, in this video we're going to show you how to simulate the NCAA tournament,

figure out the odds of each team winning.

[INAUDIBLE] prospects that are even more complex.

Okay, first we've gotta review how to simulate a normal random variable.

Okay, like, suppose,

so the average U.S.

male is 69 inches tall.

And let's say standard deviation 3.5 inches.

Okay, so how would you simulate the height of the American male?

>> You do norm-inverse, a function we discussed, you put it in a random

variable, random number, then you put in the mean and standard deviation.

And actually let's put that rand in a different cell.

Okay, so we can put the random number here.

And then we could say their height would be norm-inverse,

and here's the rand, and I've got the mean and I've got the sigma.

Okay, so if the random number is 0.13 like, here I get the 13th percentile.

That should be 3.5, sorry about that.

Okay, so I get 0.98, I get the 98th percentile, that's a perk guy who is 6'4.

Okay, 32nd percentile for someone who's 5'7.

82nd percentile, about 1 standard deviation above average, and

get that a little over 6 feet, etc.

So what does this have to do with seeding the NCAA tournament?

Well, we know how to rate sports teams, so from my best friends Jeff Sagarin's

website, Sagarin.com, you can go on there you can get

going into the tournament selections after Selection Sunday.

You can get [COUGH], sorry, ratings for each team.

And you just have to copy those into the spreadsheet the way that I show you.

And then the spreadsheet will do the rest and play out the NCAA Tournament.

We can use a data table to basically play out as many times as we want.

The key is assuming no home-edge in any game, like yes,

if Duke plays in North Carolina like they always luck out and do,

you could give them half a home-edge.

The key is the outcome of a game will be pretty much a normal random variable

with the mean equal the difference in the power ratings,

and the standard deviation about these predictions is around 11 points.

Okay, so how do you put in the information on Selection Sunday,

and then we can figure out going to the 2015 tournament,

what was the chance that Kentucky would win?

Thankfully they didn't win, because remember the last undefeated team?

>> IU '76, don't forget that.

Okay, so here's what you do.

Okay, so the play-in games,

which drive me nuts, that that's round 1, it's not round 1.

Okay, round 1 is Thursday and Friday, I'm sorry NCAA.

Okay.

So now the play-in games were Hampton versus Manhattan, Mississippi-BYU,

Boise-Dayton, North Florida versus Robert Morris.

You put in their power ratings, and then whoever wins the game, I'll show you how

we figure this out, you throw them up into the bracket where they would belong,

like the winner of Hampton-Manhattan got the honor of playing Kentucky.

So how do I play out the game when I have the ratings in?

I do a norm inverse, I pull a rand, and

the mean would be the difference in the ratings.

I guess I'll use a standard deviation of 10 here.

Let me just check that throughout.

It's somewhere between 10 and 11.

Oops, I used 11 there.

I should switch these two.

I should switch the play-in game to 11 then if I want to be consistent.

So we'll do that rather quickly here.

So this will be a standard deviation of 11.

And then the next play-in game a standard deviation of 11.

And this game a standard deviation of 11.

And this game a standard deviation of 11.

Okay. So we can play out all these games.

Again, it's normed inverse using the rand.

Take the difference in the team's ratings.

Now how do you know who won?

If the number in the result column is positive, then the team on top won,

because I was doing the prediction from that standpoint.

So here, it's a positive, Hampton won.

Here it's a positive, Mississippi won.

Here it's a positive, Boise won, here it's a negative, Robert Morris won.

And for instance, where do I throw Hampton?

Well here's something we haven't taught you about Excel.

If you do formulas, trace dependents, okay, you can see

that the cells that depend on that and theirs were, see the Hampton play-in game.

The winner of that game, I just had

pointed right here to the cell that the winner of that game to Manhattan.

See if I remove the arrow and I go here and say trace precedents, you can see

that that cell, the winner of that game, was the cell that fed into that formula.

Okay, so we've got the play-in games decided, which removed the arrows.

By the way, that trick on the arrows is in chapter 15, it's called the auditing tool.

Chapter 15 of Data Analysis and Business Modeling.

Okay.

So now how do you put the stuff in once you [INAUDIBLE] Selection Sunday and

the spreadsheet will do the rest throughout the tournament.

You can figure out various props bets or complicated bets.

I actually have a spreadsheet that simulates the outcome of every game.

Which might interest you when you fill out your pool, because maybe you get,

if you have 12 seed wins you get 12 points.

If you pick a five seed, they win, they get five points.

You could do things like that.

Okay.

So, Kentucky, you fill in the teams that in

the first bracket listed on the left side of your bracket sheet.

You fill in the number one seed in the tournament and

then you fill in the team they play.

Just fill in the first 16 teams in the order that they're listed in your

first top-left bracket on your, let's say your USA Today.

Then you fill in the next 16 teams, but

you have to make sure you put the winners of the play-in games in.

Okay, like Hampton again was a play-in game winner.

The next bracket you put in, in order with the one seed

first in the order they're listed in your USA Today NCA bracket.

Basically, you put in the next bracket,

it should be the bracket that basically, the Kentucky bracket plays.

because in the semifinals,

it's going to play whoever wins this region versus that region.

Okay, and then you put the ratings of the teams in below.

So Kentucky has a rating of 100, which is really, really good.

Okay, and then you fill in the next two brackets.

Again, Villanova was a one seed.

That turned out to be a big mistake.

Okay, and then the next bracket you'd have Duke, who won the whole thing.

And then they would play the winner of that play-in game there, and

it would keep going like this.

Okay, all you've got to do is fill in this information.

In row two fill in the teams including pointing to the winner of the play-in

games after the play-in games, then the ratings, and

then the spreadsheet plays everything out.

So like you have, this isn't one and two seeds, but

you have the first team listed playing the second team listed.

Okay.

So then it does a norm inverse and it does the horizontal lookup.

So here it says Okay, A6 is team 1.

Okay. I look up their rating in the ratings row.

I name that, got lots of names here.

But the ratings row is going to be rows 3 and 4 of the ratings table.

So that's going to look up basically Kentucky's rating,

because they're the first team, and the second team it's going to look

up is who Kentucky plays, which doesn't have to be Hampton.

You can make this thing recalc by hitting the F9 key.

So it's, we're going to simulate the outcome of that Kentucky game as

number of points by which the team on the left wins by.

It's normal.

You pick a rand.

The mean is the rating of the team on the left minus the team on the right.

And then the standard deviation's 11.

Then you pick up the winner of that game with an if statement.

If this is positive, the team on the left won.

Otherwise, the team on the right won.

And then whoever won goes on to the next round.

Whoever won the next game, goes on to the next round.

You don't have to do this and then they play, because it looks up their ratings.

And then it just keeps going like that.

And then it comes down here.

Okay, it's got the winner of each region here.

For the winner of the regional semi-finals,

it would have Kansas winning and Oklahoma winning.

And then the winners of those would play in the finals.

Let's get, Iowa State would win.

Villanova would win.

Kentucky should win sometime.

There, Duke won, okay, Kentucky wins there.

Okay, so this should be my, I could do the winner in each region if I wanted to.

I'm not going to worry about that for each semifinal.

But let's just figure out what fraction of time Kentucky wins.

Okay, so I could run this guy, let's say 5,000 times.

I could do a data table.

So I can put a 1 there.

Remember home, fill series, 1 through 5000 in columns.

Okay, and then I would put the winner here.

You do this 5,000 times.

So you go Data > What-if Analysis > Data table, column input cell.

Hit F9 so this changes.

There, they change, and now if I name this range, let's say, outcomes,

okay, so I could do countif the teams name in the outcomes here.

Whoops, I've got that backwards.

It's gotta go countif, outcomes,

team's name, divided by 5,000.

So I've got Kentucky winning about 38%.

I can double-click and copy that down.

Hit F9 again, it would change.

Kentucky wins 39%.

So people thought Kentucky was a lock.

There's just no way.

You've gotta win six games in a row.

I mean basically 38, 39% chance to win the tournament would be right now.

How many times did Duke win?

Duke won 11% of my simulations.

Virginia also won 11%.

Wisconsin, how often did they win?

They won 9%, and, so, basically you can be sure that

the Las Vegas oddsmakers have something like this.

Okay, now in the next video, we'll talk about about a sample props bet and

try and figuring out the odds of that happening.

Okay, we'll see you in the next video.

Coursera propose un accès universel à la meilleure formation au monde,
en partenariat avec des universités et des organisations du plus haut niveau, pour proposer des cours en ligne.