0:22

Of the 60 spam emails, 35 contain the word free.

Of the rest,

only three contain the word free.

If an email contains the word free, what is the probability that it is spam?

So what we want to do first is to organize this information into a probability tree.

We're going to start by dividing our population, our inbox in this case

is our population, into two, based on whether the email is spam or not spam.

So we have 60 emails that are spam, and 40 emails

that are not spam.

Now that we've done this branching, we can actually further

branch out from these and list how many of the spam

emails have the word free in them and how many of

them do not, and likewise for the no spam, non-spam emails.

Of the 60 spam emails, 35 have the word free in it, and of,

and the remainder 25 do not. And of the not spam emails, only three

of them have the word free in it, and 37 do not.

Now that we have organized the information that we're given

into a probability tree, what we want to do next

is to go back to the question and try to

figure out what it is exactly that we're being asked for.

The question is, if an email contains the word

free, what is the probability that it is spam?

So we know that the email contains the word free, so that's

going to be our given, and we're asked for the probability that it's spam.

So we can denote this as probability of spam

given that the word free is in the email.

Since we're saying that we know the word free is in the

email, we're basically saying we can in, ignore the rest of the email.

So first what we want to do is figure out how

many emails in total have the word free in them.

35 of them come from the spam folder and

three of them come from the not spam folder for a total of 38 and of these,

only 35 of them are of interest to us because those are the spam emails.

So 35 out of 38 gives us roughly 92%.

Here we've implicitly made use of the Bayes theorem.

What we have in the numerator is our joint probabilities, spam

and free, and what we have in the denominator is the marginal

probability of what we're conditioning on, the free.

Except instead of working with probabilities in this

case, to make things simple we've worked with counts.

So what we're going to do next is actually

move onto a situation where we're working with probabilities

from the get go, and we don't know the

sample size of the population size that we're dealing with.

2:53

Swaziland, has the highest HIV problems in the world.

25.9% of this country's

population is infected with HIV.

The ELISA test is one of the first and most accurate tests for HIV.

For those who carry HIV, the ELISA test is 99.7% accurate.

For those who do not carry HIV, the test is 92.6% accurate.

Note, by the way, that these probabilities are estimates.

If an individual from Swaziland has tested positive,

what is the probability that he carries HIV?

3:29

So, we're told that 25.9% of this country's population is infected with HIV.

So the probability of having HIV is 0.259.

We also know something about the accuracy of the test, which

seems to vary depending on whether the person has HIV or not.

This is very common for medical tests.

They tend to have different accuracy rates the, different accuracy rates

for whether the patient has the disease or does not have

the disease.

This statement, for those who carry HIV, the ELISA test is

99.7% accurate, basically means that probability of testing positive,

because that's what an accurate result would be if a

person has HIV, so probability of positive given HIV is 0.997.

This statement, for those who do not carry HIV, the test

is 92.6% accurate, means probability of

testing negative, because that's what accurate would

mean in this case given that the patient does not have HIV, is 0.926.

The question says, if an individual

from Swaziland has tested positive, what is the probability that he carries HIV?

So what we know is that the person tested positive.

We're looking to see what is the probability

that they have HIV.

What we can see here is that we have a situation

where we're asked for a conditional probability, and the condition has

been reversed from one of the things that we are given,

and we should really think about a tree diagram in this case.

Those tend to be the most effective methods for getting to the answer.

There are definitely other ways that you can solve this

problem, and you can organize the information that's given to you.

But a tree diagram tends to be one where you can really

efficiently and effectively organize the information that you're given.

And you're going to get to the right answer if you do it the right way.

5:19

So, the first branch in the tree is always made up of marginal

probabilities, since we're dividing up our

population without conditioning on any other attributes.

Some people in the population have HIV.

That's the top branch. And others don't.

That's the bottom branch. So probability of having HIV,

as we saw, was 0.259 in Swaziland. And the probability of not having HIV

is the complement of that, 1 minus 0.259 is going to give us 0.741.

So about 74.1% of the population in Swaziland does not have HIV.

Note that probabilities on a set of branches always add up to 1.

Next, we move on to conditional probabilities.

Let's start with

the part of the population who has HIV, so

we're going to be working with the top branch here.

When these people take the test, they may get a

positive or a negative result, because the test isn't 100% accurate.

Therefore, we divide up the HIV population into two,

those who test positive, and those who test negative.

Based on information on the test that we

were provided earlier, we know that the probability

of testing positive, if someone has HIV is 0.997.

6:33

Then, probability of testing negative if someone has HIV, this would

be a false negative, would be the complement of that, 0.003.

Similarly, among those who don't have HIV,

some still test positive, and some test negative.

Probability of accurately testing negative if the patient

doesn't have HIV is 0.926.

6:56

And the probability of a false positive, that's testing positive even though

the patient does not have HIV is the complement of that, 0.074.

Remember, our goal is to find the probability of

having HIV, given that the patient has tested positive.

Which based on Bayes theorem should be probability of HIV and

positive divided by probability of testing positive.

Remember, the numerator is always the joint probability, and the

denominator is the marginal probability of what we're conditioning on.

So far, we don't have the building blocks we

need to calculate the probability that we're interested in.

7:38

To get the join probabilities, like the

one in the numerator, using the probability

true, all we need to do is multiply across the branches.

This is why a probability tree is useful.

Because it organizes the information for you in a way where

you'd no longer have to think, what should I multiply with what.

And you, all you need to do is carry along

the branches and pick up the building blocks along the way.

8:02

We start with the marginal probability of having HIV

and we multiply it by the probability of testing positive,

given that the patient has HIV.

So I'm following the first, the very top branch here, which is

going to yield us the joint probability of having HIV and testing positive.

So what we get is 0.259 from the first branch

times 0.997 from the second branch, which gives up 0.2582.

So, there's a 25.82% chance that a randomly drawn person from the Swaziland

population has HIV and tests positive.

8:39

Similarly, probability of HIV and negative is going to be the probability of

HIV, 0.259, times the probability of negative given HIV, 0.003.

That's a really tiny probability, 0.0008.

We can keep going and calculate similar probabilities for the lower branch,

the no HIV population as well.

Probability of no HIV and positive comes out to be 5.48% and probability of

no HIV and negative comes out to be 68.60, 68.62%.

We've done a bunch of calculations so far, but let's go back to the task at hand.

We're only interested in those who test

positive, because that's what our given are.

And among these, we're especially interested

in those who actually have the HIV.

9:31

So, the probability of HIV and positive

is 0.2582, that's the numerator, the joint probability.

And the denominator is comprised of two

segments of the population who test positive.

So the overall probability of testing positive is the sum

of these, so a person can test positive because they

have HIV, or even though they don't have HIV.

Since we're saying or and these are disjoined probabilities to get

the overall probability of testing positive,

we actually add the two probabilities.

The result comes out to roughly 0.82.

So, to recap, we were asked if an individual from Swaziland

has tested positive, what is the probability that he carries HIV?

And the result we found was, probability of HIV given positive is 0.82.

What this means is that there is an 82% chance

that an individual from Swaziland who tested positive actually has HIV.