Look at an example of an actual network and try to see what the CPDs looks like,

what behavior we get and how we might augment the network to include additional things.

Now, let me warn you right up front that this is a baby network;

it's not a real network,

but it's one that seems full to - it's compact enough to look at,

but still interesting enough to get some non-trivial behaviors.

So to explore the network,

we're going to use a system called SAMIAM.

It was produced by Adnan Darwiche in his group at UCLA.

And it's nice because it actually works on all sorts of different platforms.

So it's usable by pretty much everyone.

So let's look at a particular problem.

Imagine that we're an insurance company and we're trying to

decide for a person who comes into the door whether to give them insurance or not.

So the operative aspect to making a decision is how much the policy is going to cost us.

That is, how much we're going to have to pay over

the course of a year to insure this person.

So there is a variable called cost.

Let's click on that to see what properties that variable have.

And we can see that in this case,

we've decided to only give two values of the cost variable; low and high.

This is clearly a very coarse-grained approximation

and not one that we would use in practice.

In reality, we would probably have this be a continuous

variable whose mean depends on various aspects of them all.

But for the purposes of illustration,

we're going to use this discrete distribution that only has two values,

low and high. Okay.

So now let's build up this network

using the technique of expanding the conversation that we've discussed before.

And so what is the most important determining factor

as to the cost that the insurance company has to pay?

Well, probably whether the person has accidents and how severe they are.

So here we have a network that has two variables.

One is accident and one is cost.

And in this case,

we've decided to select three possible values for the accident variable;

none, mild and severe.

And with the percentages that you see - with the probabilities that you see listed.

And what you see down below is the cost variable.

Let's open the CPD of the cost variable given the accident variable.

And we can see that in this case,

we have a conditional probability table of accident given - sorry,

of cost given accident.

Note that this is actually inverted from

the notation that we've used in the class before because here,

the conditioning cases are columns;

whereas in the examples that we've given, there have been rows.

But that's okay. The same thing - it's the same thing, just inverted.

And so we see, for example,

that if the person has no accident,

the costs are very likely to be very low;

mild accidents incur different distribution over cost;

and severe accidents have a probably of 0.9

of having high costs and 0.1 of having a low cost.

So now let's continue extending the conversation and ask what accident depends on?

And it seems that one of the obvious factors

is whether the person is a good driver or not,

so we would expect driver quality to be apparent to the accident.

But there's other things that also affect not just the presence of an accident,

but also the severity of the accident.

So for example, vehicle size would affect

both the severity of an accident because if you're driving a large SUV,

then chances are you're not likely to be in an accident as severe;

but it might also perhaps increase the chance of having an accident overall

because maybe driving a large car is harder to handle.

And then the career might affect the chances of

an accident because of the presence or absence of certain safety features,

like anti-lock brakes and airbags.

So let's open the CPD of accidents and see what

that looks like now that we have all these parents for it.

And we can see here that we have, in this case,

eight conditioning cases after - would

correspond to the three variables, two values each.

And so here, just to look at one of

the sample - just an example distribution, for example.

So if this is a fairly new vehicle,

after 2000 and it's an SUV,

the probability of having a severe accident is quite low,

the probability of having a mild accident is moderate and

the probability of having no accidents is 0.85;

whereas if you compare that to the corresponding entry

when we keep everything fixed except that now it's a compact car,

we see that the probability of having a mild accident is lower,

but the probability of having no accidents is higher,

representing different driving patterns, for example.

Okay. So with this - with this network,

we can now start asking simple questions.

So is this an example of causal inference?

Let's instantiate, for example,

driving quality to be good - and bad.

And we can see that with - for a bad driver,

the probability of cost is 81 - low cost is 81 percent and for good driver,

the probably of low cost is 87 percent.

If we look at the accidents,

we can see that for a good driver,

there is the probability of 87-and-a-half percent

of no accidents and 10 percent of mild accidents.

And the probability of no accident goes down for

a bad driver and mild accident goes up and severe accidents also goes way up.

Now note that many of theses differences are quite subtle.

There is a difference of a couple of percents one way or the other.

And you might think if you were designing a network that you'd like

these really sort of extreme probability changes when you instantiate values;

but in many cases,

that's not actually true and these subtle differences are actually quite

significant for an insurance company that insures hundreds of thousands of people.

A couple of percentage points and the probability of

an accident can make a very big difference to one's profitability.

So now let's think about how we would expand this network even further.

Vehicle size and vehicle year are

things that we're likely to observe in the insurance form.

The driver quality, something that's very difficult to observe.

You can't go ask somebody are you a good driver because everyone's going to say sure,

I'm the best driver ever.

And so that's not going to be a very useful question.

So what more - what evidence do we have that we can observe

that might indicate to us the - the value of the driver quality?

Well, one obvious one is one's - is the person's driving record;

that is, whether they have had previous accidents or previous moving violations.

So let's think about adding a variable that represents driving history.

And so let's go ahead and produce that variable.

So you can click on this button that allows us to create a node.

The node is now called variable 1.

So we have to give it a name.

So for example, we're going to call it driving history.

And that's its identifier.

And we also have the other name of the variable, which is usually the same.

And let's make that two values;

say, previous accidents, no previous accidents.

Now where would we place this variable in the network?

One might initially think that the right thing to do

is to use - to place driving history as

a parent of driver quality because if we

observe - because driving history can influence our beliefs about driver quality.

Now it's true that observing driving history changes are a probability in driver quality,

but if you think about the actual causal structure of the scenario,

what we actually have is that driver quality is a causal factor

of both a previous accident as well as a subsequent accident.

And so if we want to maintain the intuitive causal structure of the domain,

a more appropriate thing is to add -

is to add driving history as a child rather than a parent of driver quality.

You might question why it matters?

And in this very simple example,

the two models are in some sense equivalent and we could have placed it either way except

that the CPD for driver quality

given driving history might get a little bit less intuitive.

But if we had other indicators of driver quality;

for example, of previous moving violations,

then it actually makes a lot more sense to have all of these be

children of driver quality as opposed to parents of driver quality.

Okay.

So that shows us how we would add a variable into the network.

And now it's going to open up

a much larger network that includes these variables as well as others.

So let's look now at this larger network.

And we can see that we've added several different variables in the network.

We've added attributes of the vehicles; for example,

whether the vehicle has anti-lock brakes and an airbag,

which is going to allow us to give more informative probabilities regarding the accident.

We've also introduced aspects of the driver;

for example, whether they've had extra tech training,

which is going to increase driving quality.

Whether they're young or old,

where the presumption is that younger people tend to be more reckless drivers.

And whether the driver is focused or more easily distracted,

which again, is going to affect driving quality.

Now we've - since personality types is hard to observe,

we're - we added another variable which is good students,

which might indicate one's personality type.

So let's open a CPD for that one.

And so we can see here that, for example,

if you are a focused person who is young,

you're much more likely to be a good student,

much more so than if you are not a focused person who is young.

If you're old, you're just not very likely to be a student and so

this probability is - basically says that if you're old,

you're just not very likely to be

a student and therefore not likely to be a good student.

So now that we've added all these variables to the network,

let's go ahead and run a few queries to see what happens.

And let's start by looking at the prior probability

of accidents before we observe anything.

So we can see that the probability of no accident is about 79-and-a-half percent;

the probability of severe accidents is about three percent.

Now let's go ahead and tell the system that we have a good student in hand.

And so, we're going to observe

that the student is a good student and let's see what happens.

We can see surprisingly that even though we observe somebody is a good student,

the probability of no accidents went down from 79-and-a-half to

78 percent and the probability of

severe accident went up to three-and-a-half to 3.67 percent.

You might say well, but I told you that it's a good student.

Shouldn't the probability of accidents go down?

So let's look at some active trails in this graph.

One active trail goes from good student to focused to driver quality to accidents.

And sure enough, that trail,

if we consider that trail in isolation,

is probably going to make the probability of no accident be higher.

But we have another active trail.

We have the active trail that goes from good student up to

age and then back down through driver quality.

So to see that, let's unclick on good students and see what happens.

Note that the probability initially that the driver is young was 25 percent;

but then when I observed good students,

it went up to close to 95 percent.

And that was enough to counteract the influence

along the - along this more obvious active trail.

So to demonstrate that this is indeed what's going on,

let's click on the fact and instantiate the fact that the student is young.

And we can see that the probability of severe accident went up to 3.7

percent and no accident went down to a little bit shy of 77 percent.

And now let's observe good student and see what happens.

So now we observe good student and the probability of

no accidents went down to 78 percent as opposed to before,

when it was 77 percent.

Now that - and the reason for that is that we've now blocked this trail

that goes from good student through age to

driver quality by observing this variable which blocks the trail.

So we can see the reasoning patterns in a Bayesian network are sometimes subtle

and there are different trails that can affect

things in and interact with each other in different ways.

And so it's useful to take a model and play around with

different queries and different combinations of

evidence to understand the behavior of a network.

And especially if you're designing such a network for a particular application,

it's useful to try out these different queries and

see if the behavior that you get is the behavior that you want to get.

And if not, then you need to think back about how do I modify

this network to get behavior that's more analogous to the desired behavior.

This network is available for you to play with and

you can try out different things and see what behaviors you get.