[MUSIC] American Cancer Society estimates that about 1.7% of women have breast cancer. Susan G Komen for the Cure Foundation states that mammography correctly identifies about 78% of women who truly have breast cancer. An article published in 2003 suggests that up to 10% of all mammograms are false positive. These probabilities are of course estimates, as they're very difficult to calculate pro, precisely. But we're going to take these as givens for this example. As usual, let's first parse through the percentages we're given. The probability of having breast cancer is estimated to be.017. The probability of testing positive given that somebody has breast cancer is .78 and the probability of testing positive even though somebody does not have breast cancer is .10. Prior to any testing, and any information exchange between the patient and the doctor, what probability should a doctor assign to a female patient having breast cancer? Since we don't know anything about this patient's medical history, our best bet is to treat them like a randomly chosen person from the population. Hence, we would set this probability at 0.017. This is the prior probability we assigned to a patient having breast cancer before we collect data on them. In other words, before we test them. When a patient goes through breast cancer screening, there are 2 competing claims. Patient has cancer or, and patient doesn't have cancer. If a mammogram yields a positive result what is the probability that patient has cancer? If we think about this in probability notation, we're being asked to find what is the probability of breast cancer given that the patient tested positive. And earlier on, we were provided the, the reverse of this probability Positive, given breast cancer. And when we have the conditions reversed, we know that a probability tree might be useful in our calculations. So, let's start building that. There are two competing claims. Patient has breast cancer, or patient does not have breast cancer. The probability of having breast cancer a prior is 0.017 So the probability of not having breast cancer is going to be the compliment of that, 1 minus 0.017, 0.983. If a patient has breast cancer, there are 2 possible outcomes; they might test positive, or they might test negative. The probability of testing positive when the patient has breast cancer is 0.78. The probability then of testing negative when the patient has breast cancer is going to be the complement of 0.78, 0.22. Similarly, when a patient does not have breast cancer, there are two possible outcomes, testing positive or negative. Probability of testing positive even though the patient does not have breast cancer is the .10 we were given earlier so the accuracy r, rate of the test when the patient does not have breast cancer or in other words the probability of testing negative given no breast cancer is going to be the compliment of .10 0.90. We know that we're given that the patient has tested positive, so really, we're only interested in the first and the third branch here, so we don't even have to worry about the other branches, because we know the patient we're interested in doesn't come from the sect of the population that tested negative. First, we're going to want to find the joint probabilities. The probability of breast cancer and positive is the product of 0.017 and 0.78, that's 0.01326. And the probability of no breast cancer and, and positive is a product of the probabilities in the two branches leading up to that. 0.0983. We are asked for the probability of breast cancer given positive. Using base theorem, this is going to be the probability of breast cancer and positive, divided by the probability of testing positive. The numerator is simply coming from our top branch. And the denominator. The probability of testing positive is going to be the sum of breast cancer and positive or no breast cancer and positive. Since these are two disjoint outcomes. We add the two probabilities when we're saying or. This gives us about a 12% rate. This, remember, is what we called our posterior probability. Initially, we had given a 1.7% chance to this patient having breast cancer because we knew nothing about them. Then, we tested them. And they tested positive. So, now we have this additional information about the patient. Now, after data collection, the probability that we're assigning to this patient having breast cancer is slightly higher. It's at 12%. Since a positive mammogram doesn't necessarily mean that the patient actually has breast cancer, the doctor might decide to retest the patient. What is the probability of having breast cancer if the second mammogram also yields a positive result? Once again we run through our probability tree: two competing claims, breast cancer, and no breast cancer. However what has changed now is that this patient is no longer a nobody from the population. We've tested them once and they tested positive. So we have some additional information about them, and we should update our prior with this additional information. In other words we plug in the posterior from the previous iteration, the previous test. To be our new prior. And therefore, the probability of not having breast cancer is updated to be the complement of this, 88%. Next, we can run through our tree again. Remember, nothing about the test has changed. So the probability of testing positive given a patient has breast cancer is still 78%. And the probability of testing negative given the patient has breast cancer is still 22%. Similarly with the lower branch, nothing about the test has changed so the conditional probabilities of testing positive or negative given the patient has, does not have breast cancer is still 10% and 90% respectively. Once again, we're only interested interested in the branches where the patient has tested positive Because we're saying that the second mammogram also yielded a positive result, we can multiply through the branches to find our joint probabilities. And these are going to change a little bit, because our starting probabilities, the probabilities in the first branch has changed. And this time, our probability of having breast cancer and testing positive Is higher at 0.0936. And our probability of no breast cancer in positive is 0.088. In this example, we have reviewed a Bayesian approach to statistical inference. Which involves setting a prior, collecting data, obtaining a posterior, and updating the prior with the posterior from the previous step. In addition we got some practice working with conditional probabilities, probability trees, and the base theorem in general. [MUSIC]