0:09

We've seen that fair or uniform probabilities lead to geometry, to

counting, length, area and volume. But what happens when probability is not

fair? In this lesson, we'll define and describe

probability density functions. In our last lesson, we computed

probabilities under the assumption of fairness.

Mainly, that any point is as likely as any other point to be chosen at random.

This is not always a good assumption. There are many instances where there is a

bias. Where certain outcomes are more likely

than others. This bias is encoded in the notion of a

probability density function, sometimes called at PDF.

This is a function or domain that tells you what outcomes are more likely than

others such as, exam scores or heights. We define a probability density function

rho as a function that satisfies the following two criteria.

First, rho is non-negative. And second, the integral of rho is equal

to 1. We have to specify a little bit more.

Namely a domain D on, which we are discussing PDF.

So, in particular the integral of rho over D equals 1.

Now, that's the definition but it's certainly not a very intuitive

definition. What does it mean?

Well, before answering that, let's consider a specific example in the

context of a collection of light bulbs. These light bulbs will eventually fail.

But the question is, when? It happens with some sort of randomness.

But how is this randomness regulated? Well, there's some underlying probability

density function. Lets assume that it were exponential.

And that is, the light bulb is more likely to fail early.

And less likely to fail, later on. This would be a function rho of t, of the

form e to the minus alpha t. Let's say, where t is time, and alpha is

some positive constant. Is this a PDF?

Well, it is certainly satisfying the first criteria.

It is non negative. As for the second criterion, let's

specify a domain D for the time as 0 to infinity, then in this case, what would

the integral over this domain be? Well integrating an exponential function

is easy enough. This gives e to the minus alpha t times

negative 1 over alpha evaluating from t to infinity, we get 1 over alpha.

This is not going to work unless of course alpha is equal to 1.

So, what we could do is modify the PDF by adding a coefficient of alpha out in

front. If we do that, then the integral is going

to be equal to 1. Now, that's a good example of a PDF but

we still don't know quite what it means. Well, let's consider that meaning in the

context of fairness, which we already have some experience with.

Fairness connotes a uniform density function.

That means a PDF that is constant on the domain.

What would that constant be. Well, it has to satisfy the integral over

D equals rho that is this constant, times the volume of the domain.

Now, in order to be a PDF this has to satisfy that integral equals 1.

So, what does that tell us about rho, rho this constant must be one over the volume

of the domain. Let's see what that looks like in the

context of the domain being an interval, let's say, from a to b.

In this case, rho is 1 over the length of this interval.

That is 1 over b minus a. What would it look like in the case of a

discrete or zero-dimensional domain? Well, let's say we had a die, single die.

Then, the domain consists of six points, the different outcomes for the faces.

The PDF would be one over the volume of this domain.

Volume in dimension zero being simply counting.

This means that rho is equal to the constant 1 6th if we had a different

discrete set. Let's say for flipping a coin, then since

we only have two points in that domain, heads and tails, then rho would be equal

to 1 over 2, or 1 half. Now, consider this one carefully because

what we have in general is that for a discrete set of n points, rho, a uniform

density is a constant 1 over n. In the case of, say, flipping a coin,

notice that the value of rho is precisely the probability of getting that outcome.

You have a 50-50 chance for getting heads, if you roll a six sided die, your

probability of landing any one outcome is 1 6th.

Notice, also, what happens if want to consider the probability of landing in a

collection of outcomes. Let's say, what' the probability of

getting four or five? Well, we would add up these values of

rho. 1 6th plus 1 6th is 1 3rd.

Now, does that intuition carry over into the continuous case?

No, the probability of landing at any single point in an interval is not one

over the length of that interval, not at all.

However, if we take a sub-interval, then we can make sense of the probability in

terms of lengths. If we consider, with what probability

does a randomly chosen point in the domain D lie within a subset A of D?

Then we have answered this question. In the case, of a uniform probability

density function, we know that the probability of landing in A is the volume

fraction. That is the volume of A divided by the

volume of D. We could write that as the integral over

the domain, A, of 1 over the volume of D. But that is precisely the integral of the

uniform PDF rho that constant 1 over volume of D, but integrated over A, not

overall of D. This leads us to consider but more

generally the formula that the probability capital P of landing in A

with a point chosen at random. Is the ratio, the integral of rho over A

to the integral of rho over D and this explains why we want the integral of the

PDF, rho over all of D to be equal to 1. So that we can simply write the

probability of landing an A as the integral of the PDF over the sub-domain,

A. This holds in the uniform case, but it

also holds in general. If we have a non-constant PDF, and we

want to know what is the probability of lying or landing in subset A, we

integrate the probability element. That is, rho of x d x over the domain A.

Let us interpret these results, in this simple case, of a domain being the

interval, from a to b, given our PDF rho. What is the probability that a randomly

chosen point in that domain lies between a, and b.

Or by our definition this probability, P, is the integral rho of x, dx, as x goes

from a to b. Well, that integral is by definition 1.

What does that mean? When you see a probability of 1, that

means yes, it will happen. Let's keep going.

What's the probability that a randomly chosen point is exactly a?

Well, that probability is the integral of rho of x, dx, as x goes from a to a.

From what we know about integrals, that is equal to 0.

When you have a probability of zero, this means no, it's not going to happen.

What's the probability that a randomly chosen point is closer to a than to b?

Well, we would simply integrate rho of (x)dx from the left point a to the

midpoint of the domain. For concreteness, consider the example of

a company that advertises half of its customers are served within five minutes.

What are your odds of having to wait for more than ten?

Lets assume an exponential PDF rho of t is alpha e to the minus alpha t over the

domain from zero to infinity. Our first problem is, we don't know alpha

but we do know the probability of your serving time.

Being in the interval from zero to five. That is, by definition, the integral of

alpha e to the minus alpha t d t, as t goes from 0 to 5.

And we're told that that probability is one half.

Now, we can do that integral easily enough, evaluating at the limits, and

then doing a little bit of algebra to solve for alpha.

I'm going to leave it to you to follow the computations, and see that alpha is 1

5th times log of 2. With that in hand, we can now address the

question of the probability of having to wait for more than ten minutes.

Now, we would compute the probability of being in the interval from 10 to

infinity. Thus, we would perform the same integral

as before. But evaluated at limits t goes from 10 to

infinite, this yields e to the -10 alpha. If alpha is 1 5th log 2, what is negative

10 alpha? That's negative 2 times log of 2, that is

log of 2 to the negative 2 power. When we exponentiate that, we get 1 over

2 squared or 1 4th. That means that you have the 25% chance

of having to wait for more than 10 minutes.

That doesn't sound so good. But what are the odds of having to wait

for more than 30 minutes? Well, we would follow the same

computation, and need to compute negative 30 alpha.

That is, log of 2 to the negative 6th. Substituting that in would give us odds

of about 1.5%. There's one type of PDF that is of

crucial importance that you're going to see again and again.

This is called a Gaussian or sometimes a normal PDF.

This is the function rho of x equals 1 over the square root of 2 pi times e to

the minus x squared. You've probably seen this before, this is

sometimes called a Bell curve. It has a peak around x equals 0 and then

drops off. Now, there are a few things to observe.

First of all, in this case your domain is the entire real line.

That is, this is a setting of infinite extent, anything could happen.

Your PDF is certainly positive, in fact, its strictly positive.

But, the tricky thing is in verifying that it's a PDF.

That is, verifying that the integral over the entire real line is equal to 1.

You're going to have to trust me on that for now.

You don't quite have enough at your disposal to prove this.

Now, you will often see Gaussians that are translated about some middle point or

mean. You'll often see them stretched out or

rescaled somehow. What I want you to know about Gaussians

for the moment is that they are everywhere and all about.

Gaussians come up in somewhat surprising places.

If you look at the binomial coefficients that you obtain from Pascal's triangle.

And consider what the row look like, you notice that the rows tend to go up in the

middle and then down at the sides in a manner reminiscent of a shifted Gaussian.

In fact, if you were to divide these binomial quotients by 2 to the n.

Where n is the rho number then you'd obtain something that, in the limit as

you go down, converges to something very much like a Gaussian.

This is a hint at one of the deeper truths of mathematics, that Gaussians are

limits of individual decisions. Left or right.

Heads or tails. That compound upon one another to

converge to such distributions. Gaussians are indeed everywhere.

So, now we see, not only what a probability density is but also how to

compute probability by means of integration.

In our next lesson, we'll introduce a few of the main characters of probability

theory and see what roll they have to play in our story of calculus.