Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

En provenance du cours de University of Houston System

Math behind Moneyball

40 notes

At Coursera, you will find the best lectures in the world. Here are some of our personalized recommendations for you

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

À partir de la leçon

Module 1

You will learn how to predict a team’s won loss record from the number of runs, points, or goals scored by a team and its opponents. Then we will introduce you to multiple regression and show how multiple regression is used to evaluate baseball hitters. Excel data tables, VLOOKUP, MATCH, and INDEX functions will be discussed.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay let's continue with our study of multiple regression.

So you recall at the end of the last video,

we found the unemployment rate had a high P value, so we should throw it out.

Okay so with Finland out right here, and we got unemployment out right here.

So notice there's no Finland, and we've deleted the unemployment rate column.

So we can run this regression again and then we can try it, and

we'll see that this will, Be a decent regression that we can use and

gain some insights into predicting computer sales in Europe.

Okay.

So now what I'm going to do is I'm going to predict the same data set.

And the input range will be two columns now.

And let's call it Mukeregression2.

And we can click residuals here.

So we've got 20 data points.

Okay, and so this is looking good, I believe.

Okay, so let's interpret the equation we got.

Look at the P values.

Okay. So this says, we can predict

computer spent per person right here.

We predicted to be -38.48 +

0.00172*GNP per capita

+15.31 times the EDPERCENTAGE.

Okay, are these independent P-values significant?

Yes. Look at those P-values.

We don't care about the intercept.

But P values for each independent variable, way less than 0.1.

We got 0.01 and 0.02, which are less than 0.1.

So they're both important.

There's one chance in a hundred, that's 0.001.

I got that wrong.

There's like 1 chance in 1000 that GNP

doesn't help you predict computer sales after knowing education spend.

And only 3% chance education spend doesn't help you after knowing GNP.

So the question is, how good is this equation for forecasting?

What use is it?

The two numbers that help you there are these two.

So, the R-squared of 74%, and then we'll look at out lackers.

Okay, that, basically, says our equation explains 74%

of the variation in computer spent.

There's 26% not explained.

Is that a good R-squared?

It's hard to tell.

Okay, it's hard to tell if it's a good or a bad or a square.

Really, what's more important is the standard error.

Okay, so

the standard error is 29.

That means, again, 68% of forecast

accurate within $58 per year.

Sorry, within 29.

95% accurate.

Within $58.

So are there any outliers?

Well let's see if there's any of our observations where the error is off by

more than $58.

I think there's one.

I think it's Switzerland, 18th observation here.

So the 18th observation third from the bottom is Switzerland.

Now that's not a horrible outlier.

It's not more than three standard deviations.

So I'm not going to throw it out.

Again, that would be a judgment call there.

But we'll leave it in.

So how do you interpret these coefficients?

After adjusting for GNP,

1% more GNP spent on education

Yields $15 more in computer sales.

And if you do this per 1,000, if I multiply this coefficient by 1,000,

it will tell you a bit more.

A $1,000 increase in GNP.

This is all interpreted after adjusting for the other variables.

The phrase is ceteris paribus.

Means basically, about two

bucks more spent on computers per year.

And we'd expect 95% of our forecast to be accurate within $58.

Is that good enough?

Well, it's the best you can do.

Okay, so now we'll get back to sports in the next video.

Where we talk about linear weights,

which is a very important topic in really all sports.

Football, basketball, particularly basketball, but mostly baseball.

But we've learned about regression to understand linear weights and

we'll get back to that in the next video.

Coursera propose un accès universel à la meilleure formation au monde,
en partenariat avec des universités et des organisations du plus haut niveau, pour proposer des cours en ligne.