So, now we're going to say that magic is happening, you've decided to put this intervention into place, you've done it, and now you need to test whether or not you should go ahead and use it. Again, we'll get to how you actually put it all together later on, but we want to stick to the bottom line today. So, testing is the first step when it doesn't work. There are three levels of testing. There's doing it in lab with some test cases, silently running it behind the scenes but in practice and finally there's putting it into practice. So, again, we'll use our friend, the Bilirubin Light Decision. So, the first question is, how do you test whether or not the code that you developed, the computer code that you developed accurately reflects the knowledge in that graph? You need a bunch of test cases to apply to this widget too. Now, so the next question is so, which test cases should you use? The answer is you want the minimum to test the entire range of possibilities. So, if you look at that graph, there are five regions where the baby starts, there are five regions at the second place where the baby is. So, it's minimum of 25 cases. You want to add a third point in time, now you're up to 125. In each case, you need to create the case in the way that the machine can see it, knows that it gets date format. You have to get the machine to give you an answer that you need a human being to review whether or not the machine gave the right answer. Of course, if things get more complicated, you're going to get a lot more test cases. So, there are a lot of explicit rules that are in the form of IF THEN phrases. So, here's a real case about kidney function and you can see that there will be something like, if the patient is female, that patient is African American, and the Creatinine level is one level and the age is another level, then the Creatinine clearance equals such and such. Then, the second rule is if the Creatinine level is above the score. The credit clearances is below certain level then over something else. That's the second rule. Let's just stick with this first one. How many cases would I need to test whether or not this is correct? Well, female gender is at least two options. If there is more, but at least two African American, at least two races. Creatinine clearance, well, maybe it is above or below a certain threshold, and age again, whether it's above or below certain threshold. So, you have like four factors which is then you have 2 times 2 times 2 times 2. That's 16 cases minimum to test this rule and you mention that if you have more complexity in the left-hand side if then you don't have a lot more cases to evaluate. So, this is a semi-formal way of thinking about it, more far away is called the decision table, which goes to exactly what I just showed you. You can pit decision table with the 16, 32 or 64 cases and see what's the logic ought to be, then put those cases into the computer and then see if it gives you that behavior. So, now you've passed in the laboratory. You've done your 25 or 64 other cases. You've done all the three kits to make sure it all works. So, now you put it into place, but again you don't make it public. You want to see how it would have behaved in different cases. So, you put it behind the scenes. You have a fire, but you just don't have the firing visible to the user and that means that you can then go back in time, sorry, then you can later, let's say after a week or so, you sit down, you get a report, you see how many times did fire, you see whether it's appropriate. You also look to see if there were other times it should have fired in that fire. So, this is what we mean between the background. So, then we get into practice. Now, when you think about practice, this two by two table is the core of how you think about decision support. It's all above the line again. So, first on the upper left-hand corner, you see whether the patient was in danger and in fact, we should say the patient is danger. So, that's a true positive and that is a good thing. On the bottom left, patients are not endangered and the system does not alert, again everybody is happy. It said to red guys that, we got a problem with. So, if the machine alerts at the patient is being that endanger, that is a false alarm and too many of those lead to deploy requires wolf. On the other hand, the lower left-hand corner, we have red cases where the machine should have alluded and it did not. That is a missed case. That is a bad thing. So, you can see that the real money is always a trade off between the false negatives and false positives. So, what does that mean to trade off the false-negative that it's a false positives? That means somebody has to think about how many false alarms is it worth for every case that would have been missed, not to be missed. So, this is somewhat related to number needed to treat a few after an epidemiology. Number needed to harm averse but the meaning of it is that, it's a number of false alarms for every patient saved, who would otherwise not have been saved. In the drug allergy situation, a lot of doctors know about the drug allergy. So, in fact, the alert isn't telling them anything they didn't know already, and therefore the number false alarms you would need is very high. When people say that they've silence the alarms for the middle types of drug allergies, they are precisely saying that the false alarm to miss case ratio which is way on the line. So, what is testing mean in practice? Well, it means that you review what gets fired and if possible not firing. If possible not firing, because you want to make sure the we are having missed cases. Missed cases sometimes come through as sentinel events. So, bad outcome happen, you go back and look and we go, it should have been alerted, it was not alerted. Why was that? Now, it could be that it was alerted but the user decide to ignore it because they'd been so sick and tired of all these false alarms, that they've been effectively trained not to listen to the alarm. Ideally, there'd be a dashboard for rules. The dashboard can either be based on the content of that decision tree I showed you before or it could simply be numbers of hits and numbers of outcomes. Ideally, you'd want to know the number of sequelae, the number of bad outcomes resulting from either a missed case or a false positive and that becomes hard to do.