0:02

So we've argued that table CPDss are problematic, because of their exponential

growth in the number of parents. One of the classes of structured CPDs

that is most useful are classe, is the class of CPDs that encodes a dependence

of a child on a parent. But a dependence that is only happening

in certain contexts. One method for encoding that is using the

class of what's called tree-structured CPDs.

So, to understand what tree-structured CPDs are, let's look at this simple

example. Imagine that we have a.

are students. And the students is applying for a job.

and the. Job, the prospects of the students to get

the job depend on three variables. Depends on the quality of the

recommendation letter that they get from the faculty member, their SAT scores, and

whether the student chooses to apply for the job in the first place.

So let's think about one possible CPD for this, for this model.

so here we have, a tree structure that you can think of it as a set of, as a

branching process. Where, the distribution over job looks at

some variables and then decides what the distribution might look like.

So for example, initially the, The dependence is on whether the student

chooses to apply for the job or not. What happens if the student doesn't apply

for the job, well you might say in that case the student doesn't get the job, but

it turns out to be not the case, in the hay days of silicon valley, for example,

we have the different, internet bubbles students were getting job offers without

every applying for jobs and so it might actually might happen, but the students

propability of getting a job is, non zero, even in this case.

And notice that the student not having applied for the job didn't submit either

a commodation or the SAT scores, which means that the students job prospect

don't depend, in this scenario on either of these two variables.

And so in all possible configurations of the s and l variable, the proba, ugh, s

and l variables, the probability of the student getting a job is zero point two.

What if the student did choose to apply for the job.

Well, in this case we can imagine a recruiter whose primary interest in the

student's SAT scores. They don't really believe recommendation

letters all that much. And so the next, and so the recruiter

first looks at the student's SAT score. And if the student got a good score on

the SAT S1, then regardless of the recommendation letter that the recruiter

doesn't even choose to look at, the student's probability of getting a job is

0.9. Only in the case where the student's SAT

scores are not as strong, does the recruiter go back and look at the letter.

In which case there is a, a certain probability, say 60%, of getting a job if

the letter is strong. And ten% if the letter is weak.

So we can see that we have a CPD that in this case depends on three binary

variables. And so really we would need to represent

in principle eight different probability distributions over the J variable.

But we've only represented four because in certain context some of the variables

don't matter. So in fact this notion of a variable not

mattering, is, related to the notion of complex specific independence which we've

defined previously. So one can formalize this in fact as a

complex specific independence. So let's look at this tree, and think

about which complex specific independencies arise in the context of

this, tree structure CPD. So with looking at the first one does J

the variable that we care about depend on L?

In the context a1. S1.

Well, we can see that in the context a1s1, the recruiter never looks at the

letter, so in fact, j is independent of l in this context, so the answer to this

one is yes. Okay what about the next one.

J is independent of L given A1 alone. Well in this case we have we're going

down here and now there's two scenarios one in which S is equal to S1 but the

other S is equal to S0 and in this case the recruiter does look at at the letter

and so this one is in fact not true. What about the next one?

J is independent of L and S given A0. So let's look at the A0 case.

And sure enough, in the A0 case, there's no dependence on either L or S.

So this one is also true. The last one is a little bit interesting

because it's a mix of context-specific and noncontext-specific independence.

So we're asking if J is independent of L in the context of s1.

5:59

Which, in fact, is a special case of this scenario.

And so, both of these, in fact, are true independent statements.

And so, since both cases hold, we have another conditional independent statement

that holds here. Let's look at another example that turns

out to be representative of a large class of examples in this context.

So here the student when applying for the job, needs to submit a recommendation

letter, but has a choice between the two letters that they might that they might

elect to provide. One from one course and another from a

second course. So letter one and letter two.

Now, the student's job prospects depend on the quality of the letter that's

actually provided, because of course the recruiter does not have access to the

letter that was not provided. So if we look at this in the context of

the tree CPD, it would look like this. The first variable at the top corresponds

to the student choice. And it has two branches, C1 and C2.

And in the C1 case, there is dependence only on the quality of letter one.

And in the C2 case, there is dependence only on the quality of letter two.

So, this is an example of what a well. Related to something called a multiplexer

CPD because effectively, the choice variable determines the dependence on one

set of circumstances or another set of circumstances.

Now, it turns out that this example has some interesting ramifications, because

not only do we have context independent dependencies that, arise because of the

tree structure. It turns out that this also implies non

context independent dependance that we will see useful later in the course.

Specifically, we have that letter one is independent of letter two given J and C.

Now if you think about this from purely the, the prospective of the, the

deseparation structure is the flow of influence in this graph we can see that

the job actually activates. The v structure between letter one and

letter two, so you wouldn't actually expect letter one and letter two to be

conditionally independent, that is we have a flow of influence because of

intercausal reasoning. But now lets think about this in more

detail. And that's the location analysis just

like we did before. So we're now going to ask if letter one

is independent of letter two, given j and c1.

But what happens in the context c=c1? Well in this case, the, there's no longer

the dependence between job and letter two because the recruiters never given the

second letter. And so in the context C one, the graph

really looks like this where there's no edge from L to the J.

Conversely, looking at the other case analysis, where, c equals c2, in this

case, this other edge is going to disappear.

And once again, there's no v structure, and so there's no active trail between

these, two variables L1 and L2. So effectively, in both of these cases,

the active trail disappears, and so that implies the independent assumption.

10:04

The variable A over here, is the multiplexer, the selector variable.

And the selector variable takes on values in the space one decay and it selects

which of the ZIs the Y copies. And notice that the Y here is

deterministic, as we can see by the fact that we have these two these two lines

surrounding it which is our way of indicating deterministic dependencies.

And so what is the CPD of the variable Y given the selector A and the parent Z1 up

to ZK? We can think about this as remember we

need to specify probability distribution. So, this probability distribution is one

if, y is equal to z sub a. So what does that mean?

It means that, and, and zero otherwise. So what does that mean?

It means that if A stays equal to little A, then deterministically Y is equal to Z

sub little A with probability one and that is just a formal way of saying that.

So A tells us which, which of the variable Z Y needs to copy.

This turns out to be an extremely useful concept in a variety of applications.

So, for example, when we have perceptual, uncertainty, when you have noisy sensors,

where we observe, say, we have, say, a sensor observation of one

of several airplanes. But we don't know which airplane it is

that we're observing. And so the position of the observation is

the, represents the position of the airplane that we're observing.

But the variable A, here, is the one that tells us which airplane it is, which we

might also be uncertain about. And this gives rise to a whole set of

problems, known as registration or correspondence or data association

problems. Which are very common in many

applications. Different type of application for this,

type of structured CPD comes up in physical hardware configuration settings.

So this is an actual, example from a troubleshooter, for printers used at

Microsoft. And it turns out that all of the

troubleshooters that are part of the Microsoft operating system are,

Built on top of a Bayesian Network Technology.

So here the task is to try and figure out why a printer isn't printing.

So we have a variable here that tells us whether the printer is producing output,

and, that depends on a variety of factors, but one of the factors that it

depends on is where the printer input is coming from.

Is it coming from a local transport? Or a network transport.

And, depending on which of those it's coming from, there's a different set of

failures that might occur. So the variable here that serves the goal

of the selector variable is this variable print data out.

And that's the root of the tree that's used here.

And and depending on whether the print location is local or not.

then you depend either on properties of the local transport.

Or on properties of the network transport.

And it turns out that even in this very, very simple network, the use of tree

CPD's reduces the number of parameters from 145 to about 55, and makes the

elicitation process much easier. So to summarize tree CPDs provide us with

a compact representation that captures effectively this notion of dependence in

in a context specific way, and as we've mentioned is relevant in a broad range of

applications of which we're, we've only given some examples, hardware

configuration, medical settings, where depending on the kind of situation that

you're in you might depend on one set of predisposing factors, say, or another.

Dependence on an agents action, as we've seen for example in the student's

decision on whether to apply for a job or not, or which letter to submit.

And we've also discussed perceptual ambiguity, where the value of a

particular sensed observation depends on which real world that observation comes

from.