0:00

So we've learned a lot of little bits and pieces of representation, that are used

to put together a graphical model. And now let's try and take a big step

back, and figure out how you might put these all together if you actually wanted

to build a graphical model for some application that you care about.

Now, let me start by saying that this is really not a science.

Just like any other design, it's much closer to an art, or even a black magic,

than a scientific endeavor. And, so the only thing that one can do

here is to provide hints about how one might go about doing this.

so let's first identify some important distinctions, and then we'll get concrete

about particular examples. there's at least, three main classes of

design choices that one needs to make. the first is whether you have a template

based model versus a very specific model for concrete fixed set of random

variables. Whether the model is directed or

undirected. And whether it's generative, versus

descriminative. These are all terms that we've seen

before, and we'll talk about them in just a moment.

But, before we go into the sort of, trade offs between each of these, let me

emphasize this last point, which is probably the most critical thing to

remember. It's often not the case, that you just go

in one direction or the other. That is in many models you're going to

have for example, template based pieces, as well as some stuff that, isn't at the

template level. you might have directed as well as

undirected components, and so on. So these are not, a sharp boundary, and

it's useful to keep that in mind. That you don't have to go only one

direction versus the other, in a real problem.

Now the first important distinction is template based versus specific.

And what are some examples of specific models? So for example medical diagnosis

is usually a specific model. That is you have a particular set of

symptoms diseases and so on that you want to encode in your model so that's one

example. on the other side on the template based

side you have things like image segmentation.

3:02

All diagnosis has you can think of it as a specific model that is you can think

about writing a diagnostic model for this particular type of printer.

But really, if you're in, inside a company that's writing a diagnostic tool

for your line of fifteen different printers, they're going to have shared

components. And if you have a component inside

printer one that also appears inside printer two, chances are that it's going

to have the same fault model. And so you're going to have elements that

are unique, and elements that are shared. And so once again, it's something that's

going to sit at the intersection between the two.

That, said. Once you've decided where on this

spectrum you sit. It kind of really changes the way in

which you tackle the knowledge engineering problem.

Because template based models, are usually.

Not always, but usually. have a fairly, small.

Number of variable types. So, for example, in our image

segmentation setting, you have the class label, that is one variable type.

Nevertheless, we manage to construct very richly expressive models about this,

because of interesting interactions between multiple class labels for

adjacent, for different pixels in the image.

But it's a very small number of variable types, and most of the effort goes into

figuring out things like which features are most predictive.

6:13

a higher performance model. So you might wonder, well when would I

use a generative model, I mean that he gets at high performance by using richly

expressive features, and there's multiple answers to that.

One answer is when I don't have a, when I don't have a predetermined task so when

the task shifts. So for example, when I have a medical

diagnosis pack, every patients present, every patient presents differently.

In each patients case I have a different subset of things that I happen to know

about that patient. The symptoms that they present with, and

the tests that I happened to perform. And so, I don't want to train a

discriminative model that, uses a predetermined set of variables as inputs

and a, predetermined set of diseases as outputs.

Rather, I want something that gives me flexibility to measure different

variables and predict others. The second reason for using a generative

model. And this is looking way forward in the

class. is that it turns out that generative

models. Are easier.

A train in certain regimes. And specifically, just to sort of make

sure, just to sort of say it out loud, in, the case where the data is not, fully

labeled, it's, it turns out that generative models can some, that, that,

sometimes you can't train in this form of the model, but you can train a generative

model. So we'll definitely see that when we get

to that part of the course. Okay, so having talked about these

different these different regimes. Now let's think about what are the key

decisions that we have to make in the context of designing a graphical model.

So, first of all what variables are we going to include in the model and

regardless of whether we have a fixed. Or varying task in hand.

We have usually a set of variables that are the target variables.

These are the ones we care about. So, even in the medical diagnosis

setting, you have a set of disease variables.

Which are the ones that we care to predict.

You might not care to predict all of them, in any given setting.

But they're usually the targets. We have the set of observed variables.

Again, they might not always be observed, but you don't really, necessarily care

about predicting them. So these might be in the medical setting,

things like symptoms and test results. And then, the third category might be a

little bit surprising. So, we might have variables that are

latent or hidden. And these are variables, that, we, don't.

8:49

Nor do we necessarily care about predicting, they are just there.

Why would the [INAUDIBLE] model variables that you neither observe nor care to ever

look at? So, let's look at an example.

Let's consider. Imagine that I asked all of you in this

class, what time does your watch show? Okay?

So each of these WIs is the watch. The, the time on the watch of each of you

in the class. So we have W1 up to WK.

Now, these variables are all correlated with each other.

But really, they're not correlated with each other.

Unless we all had, like, a watch setting party just before class.

Really, what they're all correlated with is Greenwich mean time.

So you have a model, in this case it's a naive base model, where you have

Greenwich Mean Time influencing a bunch of random variables that are

conditionally independent given that. Now Greenwich Mean Time is latent unless

we actually end up calling Greenwich to find out what the current time is right

now in Greenwich, which I don't think any of us really care about.

But why would we want to include Greenwich Mean Time in our model?

Because if we don't include Greenwich Mean Time, so if we basically eliminate

Greenwich Mean Time from our model, what happens to the dependency structure, of

our model? We end up with a model that is

[INAUDIBLE]. And so sometimes latent variables can

simplify our structure. And so there useful to include even in

cases where we, real, don't really car about them, just because not including

them gives us much more complicated models.

Which brings us to the topic of structure.

when we think about Bayesian networks specifically.

The, the concept that comes to mind. The question that comes to mind is, do

the arrows, given that they are directed. Do they correspond to causality?

That is, is an arrow from x to y indicative of having a causal connection

from x to y? So, the answer to that is yes and no.

Very satisfactory. so what does no mean in this case?

Well, we've, we've seen. It means consider a model where we have X

pointing to Y. We'll just, you know, do the two variable

case. Well, any distribution that I can model,

on this graphic model where X is a parent of Y, I can equally well model in a model

in the Bayes Net where I invert that edge and has a Y pointing to X.

So, in this example, as well as in many others, I can reverse the edges and have

a model that's equally expressive. And in fact I can do this in general that

is you can give me any ordering that you want on the random variables and I can

build you a graphical model that can represent them.

Any distribution that has that ordering on the variables so you want X1 to come

before X2 to come before X3 and you want to represent the distribution peak,

that's fine no problem I can have a graphical model that will do that but.

That model might be very nasty. And we've already seen an example of that

when we had a case where X1 and X2 were both parents of Y, and it was, you know,

a simple model that looked like this. And if I want to invert the

directionality of the edges and put Y as a parent of say X2.

12:39

Then, I have to, if I want to capture the distribution that I started out with that

for which this was the graph. Then I end up having to have, a, a direct

edge between X1 and X2. And so what happens is that causal

directionality is often simpler. So to drive this home even further, let's

go back to our Greenwich mean time example.

Where we have the Greenwich mean time is in some way the, the cause or the parent

of the different watch, times that we see our in different individuals.

And let's imagine that I force you, to invert the edges.

What's it going to look like? Well.

And now I'm going to force Grenwich mean time to be the child of all these.

And now what? Is this the correct model?

No, because this says that all of the watch times are independent which we know

is not the case. And so, what we're going to end up with

as the model is the same horrific model that I showed before where everything is

connected to everything else. And so causal ordering, although it's not

more correct than a non-causal ordering, it's sparser.

So generally. Are sur as well as more intuitive so more

intuitive. As well as easier to parameterize.

14:07

Very human. So again your not forced to use it and

sometimes there is a good reasons not to do it but.

It's generally a good tip to follow. So how does one actually construct a

graphical model? Do we have in our minds some monolithic P

of some set of variables, X1 up to XN and we just need to figure out how to encode

that using a graph? Well maybe implicitly, but certainly not

in any explicit form. The way in which one typically constructs

a graphical model in practice is by having some variable or sometimes set of

variables that we wish to reason about. So, for example, we might care about the

variable cancer or maybe even lung cancer.

Well, what influences, whether we have cancer.

whether somebody is going to get lung cancer.

Well if we go an ask a doctor. What is the probability for someone to

get lung cancer? The doctor is going to say, well.

You know, that depends. And you might say, what does it depend

on? An the doctor will say, well.

Whether they smoke for example. At which point.

You're likely to add the variable smoking as a parent to the lung cancer variable.

The doctor might say well but that's not the only thing, it might the probability

of cancer also depends for example on the kind of work that you do because some

kinds of work involve more dust particles getting into your lungs and so again

here's another variable which you would add as a parent.

And I even go and ask either there a doctor or an expert in a different domain

what is the probability that somebody smokes?

And if they think about it they're likely to say that depends, and what does it

depend on? Well maybe their age, gender,

maybe their, the country that they live in because certain different countries

have different smoking frequencies. And so once again, we're going to extend the

conversation backward to include more variables up to the point that we can

stop, because if you now ask for example, what is the probability of gender being

male versus female, well anybody can answer that one.

And at that point one can stop because there's no way to extend the conversation

backward. Is that enough?

Usually not because we also need to consider for example, factors that might

help us might indicate to us whether somebody's going to have can, somebody

has cancer or not. And so we might go and ask the doctor

what are some pieces of evidence that might be indicative here, and we would,

the doctor would tell us for example, coughing or maybe bloody sputum and

various other things that would be potential indicators.

And at that point, one would say, well, okay.

What is the probability of coughing given lung cancer?

And again, one would now extend the conversation backward to say.

Well, other things may cause coughing. For example, having allergies.

And so once again we would, go from here and extend backward, to construct a

graphical model that captured, all the relavent factors for answering queries

that we hear about. So, that's the structure of a graphical

model now let's talk a little bit about parameters, the values of these

parameters and what make a difference here, so here are certain things that

really do make a difference, to parameters, zeros.

Make big difference. And when we talked about diagnosis we saw

that many of the mistakes that were made in early medical expert systems were

derived from the fact that people gave zeros to things that was unlikely.

But not actually impossible. And so zeros are something to be very,

very careful about. Because you should only use something,

you should only give probability zero to something that is, impossible perhaps

because it's definitional. Otherwise, things really shouldn't have

probability zero. Other things that make a difference are a

sort of weaker versions. So for example, orders of, order of

magnitude differences, the difference between a probability of one over ten

versus one over 100 that makes a difference.

It makes a much bigger, whereas small differences like 0.54 versus 0.57 are

unlikely to make a difference to most queries.

Finally it's turned out that relative values between conditional probabilities

make a much bigger difference to the answer than the absolute probabilities.

That is, the, comparing different entries in the same CPD, relative to each other,

is a very useful way of of evaluating the graphical model and seeing whether the

value. Use that you use for those relative

ratios really make sense. Finally,

Conditional probability tables are actually quite rare acceptance small

applications. In most cases one would use structured

CPDs of the forms that we've discussed as well as the variety of other forms.

So let's talk a little bit about structured CPDs because those are

actually quite important. and we can break up of the.

The types of CPD's that we've talked about along two axes: one is whether

they're intended to deal primarily discreet or with continuous variables.

And on the other side is whether they type of structure that they encode is

context specific, where a variable might make a difference in some circumstances

and not in others, versus aggregating. Of multiple weak influences.

And so let's give off an example of each of these categories.

So for discrete and context specific, we had three cpd's as an example.

For discrete and aggregating we had sigmoid.

CPD's as well as noisy or, where noisy max or any one of those, that family.

For continuous CPD's we didn't actually talk about context specific,

representations, but one can take the, continues version of tree CPD called a

regression tree. Where one breaks up the context based on

some threshold on the continuous variables.

21:35

Finally, it's important to realize that a model is rarely done the first time you

write it, and just like any code design model design is an iterative process

where one starts out somewhere, test it and then improves it over time.

So importantly once one constructs a model, the first thing to do is to test

the model. Ask it queries and see whether the

answers coming out are reasonable. There's also a suite of tools to do

what's called sensitivity analysis. Which means that one can do, for, one

for, can look at a given query, and ask which parameters have the biggest

different on the value of the query, and that means those are probably the ones

that we should fine tune in order to get the best results to the queries that we

hear about. Finally any iterative refinement process,

usually depends extensively on a process of error analysis.

Where once we have identified the erros that our model makes we go back and try

and see which improvements to the model are going to make those errors go away.

It could be for example adding features for example in some of the image

segmentation work that we did there's features that might help eliminate

certain errors that we see in our segmentation results.

Or maybe adding dependencies to the model that can capture the kind of structure

that's in it.