One of the most common applications of Bayesian networks or rather one of the earliest ones that are still very much in use today, is for the purpose of diagnosis. And by diagnosis I mean both medical as well as fault diagnosis. Now this dates back into the early 90s in the Phd thesis of. Heckerman et al won the ACM dissertation award in a system called Path Finder. Which looked at a range of different piece of evidence in order to help a doctor diagnose a set of diseases. And specifically it was focused initially at least on lymph node pathology. So 60 different diseases, all sorts of different symptoms. And they tried out a bunch of different rules, a bunch of different methods for solving this problem. So the first one they actually tried, this was way back in the early days of artificial intelligence before Bayesian networks were in common use. Then they tried a rule-based system and it didn't work very well. The second version of pathfinder used the naive base model which assumes that all of the symptoms are independent given the disease and even that really simple model got superior performance to the rule based system that they initially tried. Pathfinder three still use naive base but if you naive bayes with better knowledge engineering that is they actually they actually understood some of the issues behind what makes a system like this work well and one. And they fixed it. So specifically one of the things that turns out to be really fundamental for the performance of any probabilistic modeling system is not to put in zero probabilities ever, except for things that are definitions because once you put in a zero, no matter how much evidence to the contrary you have, you will never ever be able to get rid of it. Because anything, I'm zero is still zero, And so, here in the initial pathfinder tool they put in some incorrect zero probabilities for things that were very unlikely, but not impossible. And it turns out that, that gave rise to about 10% of incorrect diagnosis of the system. They also did better calibration of conditional probabilities which turns out to be important for knowledge engineering of a Bayesian network. So, for example, it turns out that it's a lot easier to compare the. For a physician to compare the probability of a finding. A piece of evidence between two diseases as opposed to the probability of two different findings within a single disease. It's much, it's much easier to say oh, this is much more likely in this context than in that context. And it turns out that when they asked the physician to calibrate this way, they got much better estimates of the probabilities. Mind you this was way before they had learning, so it was all hand constructed. and then finally Pathfinder four was the full bayesian network in all of its col full glory it no longer made incorrect assumptions about independencies between different say symptoms given the disease and that gave us and that both allowed them to. Make the model more correct, and also it turns out it has an unexpected side effect by allowing say, a symptom variable to have more parents than just a single disease variable. It actually gave rise to considerably more accurate estimation of the probabilities because the doctor could kind of think about different cases and didn't have to average them all out in his heads. And this is one of the, I think, really compelling aspects of of daisy in network models. Which is that the daisy in network model actually turned out to agree with the experts. In an expert panel of physicians in 50 out of the 53 cases. And these were hard cases. These were ones that you really needed the expert's opinions on. It wasn't one that that just an average doctor could necessarily diagnose correctly. And this is as compared to 47 out of 53 for the naive Bayes model and significantly left enough for the role based system. Mind you, and this is an interesting and important. Aspect is that the Bayesian network actually outperformed the physician who designed the model. And, I mean, it didn't outperform the expert time a little bit. But it performed the physician who designed it. Because it was, better at putting together all these different numbers in a way that a doctor just can't fit all of these different findings into his or her brain at the same time. so we talked about the, CPS network, it's one of my, favorite networks because it's kind of big and hairy and sort of kind of scary to look at, but anyway, the actual number of variable in this network is about 500. And each of them has, on average, about four values. So the total number of parameters, if you were to specify a full-joint distribution, is four to 500. So that's about, it's about four to the 500, or two to the power of 1000, which is more than the number of [INAUDIBLE] in the universe. So obviously one couldn't specify this as a complete joint distribution. Not to mention that the probability of each and every one of these is about, is as close to zero as makes no difference, because it's the probability of, you know, 500 different, it means an event involving 500 variables. if, you were to, if you were to actually construct a CPD for each one of these for each of these variables, the [INAUDIBLE] parameters would be about 133 million. Which is considerably better that two to the 1,000, but still much too large. And so it turns out that they made additional simplifying assumptions, that we'll talk about later on, that allowed them to avoid a complete table representation of the CPD's and rather do a more compact one, and that gave rise to about a 1000 parameters. Which is still locked, but not, but actually is attractable to deal with. So we already talked about the fact that these medical diagnosis systems have emerged from research. And Microsoft built a medical diagnosis system. Various other people have built one as well. This has been a little bit slow on the uptake in the medical field. Because it doesn't fit naturally into a physician's, pipeline. maybe now, with the advent of medical, of electronic health records. There will be more data entered into the computers. So, the systems will be in more common use. But, until very recently, most doctors just wrote stuff down on paper. And so there, it was very difficult to put this into the standard, production pipeline for diagnosis. And then, finally, full diagnosis has been a much. More direct application of the systems because here we do not have an issue you know how doctors statistically do their diagnostic pipeline, so within the windows operating system there is thousands of these little troubleshooters that help you diagnose problems with your printer, with excel, with your email and each of these has a little Bayesian network inside that answers probability questions given observations about what What the system the, about, about the model involving in this for example the printer. And, there's also a big website out there that does car repair. And you put in the make and model of, and year of the car, and what are the main problems with it in it. figures it out and tells you what to look at, and what the most likely complaint is. And the reason behind, the benefits of this. People don't use Bayesian networks for this, just because Bayesian networks are cool, even though they are. they use this because it provides a very flexible user interface for this, for the user, You instantiate the evidence in the Bayesian network. Out comes a probability. You don't want to answer the question right now. That's okay. You can answer it later. It's just means that it's an observation that you didn't get the condition on. And. and then for the designer, this type of system is really easy to design and maintain. Because if, for example, something changes a little bit in your printer structure. If you were to design a standard menu-based system. You would have to go and rebuild the entire tree that asks, you know? What, what is the, what, that decides what is the first question to ask? And what is the second question to ask? And what is the most likely diagnosis? Here in the Bayesian network. You change one probability, maybe add an edge, and everything just emerges from that in a very straight forward way. So it's much more modular and more maintainable than, than a hard-wired menu-based system. And that's what the people who use these systems will tell you, that's why that's why they chose this path as opposed to as opposed to the hard-wired methodology.