Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

En provenance du cours de University of Houston System

Math behind Moneyball

36 notes

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

À partir de la leçon

Module 9

You will learn how to rate NASCAR drivers and get an introduction to sports betting concepts such as the Money line, Props Bets, and evaluation of gambling betting systems.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Let's talk about rating NASCAR drivers.

We talked about rating sports teams but this is a bit different.

Okay, so what we've got here are the results of all the NASCAR races,

the 2015 season, through the end of.

And so I downloaded this from ESPN.com.

I just copied and pasted them.

So, in other words this is the order of finish on the February 14th race.

Matt Kenseth was the first car.

Martin Truex, the second car.

Brad Keselowski, the 25th car.

If I look at March 8th, Kevin Harvick has had a great year at the Spurs.

Martin Truex was second, Sam Hornish Jr. was 24th.

Okay, so how can we set up a solver model to rate these drivers?

Well, let's think about what the changing cell should be?

I think, okay so, there were row five, 62.

Some of them only very few races.

Okay.

So, how would I rate these guys?

Okay, well the changing cell should just be a number one through 62.

Notice the ranking of the driver.

It's the best driver should get a ranking of one,

the second best driver a ranking of two.

For instance I have shown you the ESPN rankings as of the first week in June and

they had Jimmy Johnson number one, he won the most races.

Kevin Harvick, number two,

you'll see I have Harvick number one with my system, and I've got,

and they've got Dale Earnhardt, Jr., number three.

I've got him number seven, okay?

And I've put in column C the ESPN rankings.

Okay.

So what you want to do is put a trial set of integers between one and

62, I think here.

Again it goes row five to row 62, so I think there's 61 draws.

Let me just check that.

So everybody should get a number one through 61.

Okay.

So the changing cells, it's an integer one through 61 for

each driver where everybody gets a different number.

Okay, and there is a way to solve this with the Excel solve roots, the all

different algorithm under evolutionary solver, which we have not used yet.

Okay. And whenever you have,

let's say 20 changing cells, you want the integers one through 20, and

be all different, then you can use the evolutionary solver all different.

Okay. The Data Analysis and

Business Modeling book talks about this method in Chapter 37.

It's usually applied in sequencing problems or

traveling salesperson problems.

But here we'll just apply it to rating NASCAR drivers.

Okay.

So these are our changing cells.

This happens to be what we got is the optimal results.

Okay.

And so each race.

What I did is I looked up the driver's rating and there was like name this range

and if there are a lot of worksheets called look rating.

Then I did a V lookup [INAUDIBLE] from column B.

I can look up the guy's rating.

See, no matter what worksheet I'm in,

look rating applies to the ratings I've got in the first worksheet.

Okay, so do your V look up false, so for instance,

Kevin Harvick, okay, you look up his rating and he turned out to be number one,

look up Jamie McMurry, his rating is currently number five.

Okay.

Now what you want to know is, how would you get a target cell?

Well the target cell is, you see,

based on your ratings, where does the driver rank in the race?

Like, if he has the lowest rating, he would rank first.

The second lowest rating of those in the race, he would rank second.

And you compare that to the actual finish and minimize squared error.

So for example in this first race, my trial value for,

the value I got for Matt Kensick, the drivers which is 13.

Okay and he was basically, he came in first in that race.

So the squared error would be, where did he rank among the drivers?

I used the rank function.

I said, where's the ranking of the rating among the drivers in the race among

all the rankings that we have for that race.

And I used the rank function with a one,

because a one makes the lowest number get a rank of one.

See, when you have a one there, okay.

When you say zero or omitted it ranks things in descending order,

otherwise it's ascending order if you'd use let's say a one there.

So you can see the guy who gets a rank of one

would be horrific because he has the lowest number here.

Jeff Gordon gets a rank at two because he had the second lowest number.

Jimmy Johnson gets a rank at three.

Now Austin Dillon is the 31st best driver when we're done, but

he gets a rank of 24 because that was the 24th smallest number

in the list of drivers for that race and so we minimized squared errors.

So we have to sum the squared errors in I1 and we put that in each race,

the sum of squared errors for that race in I1.

Okay, now how would I write the target cell?

I did it in A1, I used what's called a 3D formula,

which we haven't talked about.

And a 3D formula, you basically say you give a starting worksheet February 14,

the ending worksheet May 31st, and you say the cell you want to sum.

That's why I put the sum of the squared errors for each race in I1.

Now the way you can enter this formula, you could say =SUM(.

You can go to I1, and you can hold down the Shift key, and

you can go to the last worksheet, which is here.

And see, if then you put a right parentheses and that enters the formula.

Okay. So, I don't need that because I already

have it there.

But that's the way you can do this.

Okay. So, now,

what would the solver window look like?

Now let me reset all here because it's a bit different,

we'll put it in from scratch.

So the objective is to minimize the sum of the squared errors, minimize.

If you want to change the ratings, I think I basically gave that arrangement,

but I'll just highlight it.

Now, if you add, I think I had these as the ratings,

like if you use F3, paste in the ratings.

Now something we haven't used, we haven't used the all different.

We'll use the binary in a couple of lectures and talk about fantasy sports.

But the all different, now this has to be a variable cell.

Oh I see, the rating I used to.

I didn't rename that range, okay.

So we'll just point to that.

We'd say all different.

Okay.

And then, you should go to options, under evolutionary,

there's something called a mutation rate.

Use something called evolutionary algorithms to sort of swap

out the ordering of the drivers but again it'll put a different

integer between one and N if there are N changing cells, in this case N is 61 for

each driver and you check the required bounds on variables.

Let it run about five minutes or

so, I'm not going to bore you with doing it but this is the answer I got.

I got a sum of squared errors of 2514 and then here are my rankings for the drivers.

Maybe I should have eliminated drivers who weren't in many races.

Danica Patrick who's from Indiana who we're a big fan of,

I had her 17 at the ESPN.

I've had her 20th.

But we're fairly close on the rankings that I got and the ESPN rankings.

I think the ESPN rankings over emphasize top finishes and

forget about bad finishes.

Like Jimmie Johnson did have some bad finishes.

Here he was 15th out of 20.

I mean, so here's one.

Let's see where, he was 40th.

And, you know

when you're looking at the guy who won the most races, you ignore their bad finishes.

But I mean, you could have a different system.

In other words, you could basically say if you ranked in the top,

you could minimize the sum of the absolute differences because

Jimmy Johnson got a huge penalty here for that 40th place finish.

Matter of fact,

if I would take that out I bet Jimmy Johnson might jump up to number one.

Because I had a pick for third in this race and he came in 40th and

that's going to really kill my target zone.

So maybe absolute errors would be a more sensible way to do this.

I'm not going to worry about that we may give that as a test question and

see how things work.

Okay but that explains how you would rate NASCAR drivers,

I think that finishes our videos on ratings.

And I think we're going to start talking a little bit about gambling on sports.

Okay, we'll see you in the next video.

Coursera propose un accès universel à la meilleure formation au monde,
en partenariat avec des universités et des organisations du plus haut niveau, pour proposer des cours en ligne.