0:00

Okay, so we're back and we're going to talk a little bit more about computing

the size of a giant component in a random graph and in particular,

we'll focus in on poisson or GMP random graph, [inaudible] random graphs.

And, so we're doing giant component calculation and we're going to do sort of

a heuristic calculation but one that turns out to be fairly accurate.

And the idea here is we'll just do the following kind of calculation.

Let's let q be the fraction of nodes in the largest component.

Now, let's look at a given node and if we just pick a node at random,

the chance it's in there is q.

But another way to figure out that chance that a given node

is in the giant component has to be that all of its neighbors are

outside of the giant component if it's outside the giant component or if it has

any neighbor inside the giant component it'll be inside the giant component.

And so that'll give us an ability to do a calculation.

So the probability that a given node is outside the giant component,

so it's not in the giant component,

is 1-q since q is the probability that it's in there.

But that's also equal to the probability that all of its neighbors are outside.

So if the node has degree d,

if it has d neighbors,

then the chance that each one of those is outside of

the giant component is (1-q) we raised that to

the dth power and we end up with

(1-q)^d is the chance that it's outside the giant component.

And so what we can do is just look at the expected degree,

take an expectation over this based on the probability of it having

different degrees and then find an equation based on that.

So we have this q,

we're going to try and solve for q.

So the probability that you're outside the giant component that's 1-q,

it's also the probability that all of your neighbors are outside the giant component.

You have d neighbors.

What's the probability that you have d neighbors?

Well, it's given by a degree distribution.

So p(d) here is our degree distribution,

so this is the chance that you have

d neighbors and once we've got a degree distribution then we can solve for q.

Now this is a calculation you could do.

It's a heuristic calculation because I'm not worrying

about correlations and some of these variables.

I'm treating this as if they're independent.

They're not quite independent.

You can do a more exact calculation using generating functions.

There's some of that in the book but this will give us

a fairly good approximation and work fairly well especially on large networks.

So what we want to do is solve for q.

Now, this is an equation so far that doesn't require

us to work with a specific degree distribution.

You could plug in any degree distribution you wanted into

this and then solve for what q is.

It's a non linear equation so it's going to be a

little difficult to solve for a closed form for q,

but you can plug in whatever degree distribution you want and then

solve that for q or get

an approximate solution for q as a function of your degree distribution.

So this is a nicely well-defined equation.

And basically, when we look at this

there's always going to be a solution of q equals zero.

So there is one solution to this equation which has q equals zero.

Right? So we're basically solving this equation.

One possibility is, well,

if we put in q equals zero here and q equals zero

here then we're just summing across all degrees for the probability of d,

we get one on both sides.

That's always going to be a solution.

The interesting solution to this is going to be the one where we have a non-zero q.

So we want to solve to a non-zero q.

So one possibility is we assume everybody is outside

the giant component and then everybody ends

up outside the giant component because all their neighbors are.

That's the nonsensical solution in a lot of situations so the one we want to solve

for is going to be the one where there's a positive solution to this equation.

So if we look at this and try and solve for

the positive solution it's going to be the point where 1-q is equal

to the ∑(1-q)^d P(d).

And so we're trying to solve for what's,

what 1-q is solve this simultaneously,

that gives us a solution to q. That's what we need.

You could solve this for different degree distributions.

Let's do it for the case of GNP or approximately a poisson random graph.

So we'll plug in a poisson distribution for PFD.

So we have the poisson distribution here.

So this is the probability that a given node that has degree d is given by this formula.

And when we plug that in then we end up with this equation.

So now we can solve this for q and this looks like a daunting equation.

It's got sum over factorials of d. So again this is sum over different ds.

We've got factorials here.

It looks like a fairly nasty equation.

But luckily there are some nice parts to this equation.

So when you look at things that look like sums of

some expression raised to the d over d factorial,

that actually has a nice exponential form.

So that sum is a nicely behaved sum.

And one useful approximation,

just to have in the back of your mind,

is that when you look at writing down a Taylor series approximation for e^x,

that looks like ∑e^x/d!.

So Taylor series approximation tells you that e raised to

some power of x is the same thing it's suming x to the d over d factorial.

So you've got a nice expression here and in particular when we go

back we look like we're summing something to the d over d factorials.

That's going to look like an exponential.

So when we plug in that expression now we say we can

rewrite this given our formula as exponents of what's inside here.

So we rewrite this in a nice form so that

now we've got 1-q is equal to this expression here.

Okay? And collecting terms,

sum of our P simplify here and we end up with e^(-q(n-1)p).

Remember that (n-1)p is the expected degree, right?

So P is the probability of any link,

multiply that (n-1) times,

that's the the expected degree.

So what we end up with is here 1-q is equal to e^(-qE(d)) and if we

take a log of both sides we get -log(1-q)/q= E(d).

So a very nice simple equation.

Now we've got a relationship between the q

on the one side and the expected degree on the other side.

So now we've got an expression that tells us how

large the giant component is as a function of the expected degree.

It's not quite a solution directly in q because it's

-log(1-q)/q but what we can do is at least plot this out and see what it looks like.

So if you plot that out as a function of the E(d)s and then see what the qs look like,

basically what happens is as q gets close to one there is no giant component, right,

so q is very small at one and then the giant component

grows and by the time you get up to three as an expected degree,

your q is actually very large.

It's already close to 95%.

So by a degree of three we've already hit close to 95% in

terms of the size of the giant component and at one we're close to zero.

So this is interesting and this is, you know,

somewhat characteristic of these rationally random graphs.

We get these Tyche phase transitions.

So if people have expected neighbors

less than one expected interactions that would transmit a diffusion of less than one,

we don't get diffusion.

Once you hit one we get a diffusion and then

it grows fairly rapidly and by the time you have

three interactions then we're already to the point

where you're very likely to hit most of the population.

So if you remember in our promo video I said tell at least two friends.

Well, if you tell just one friend about this then we don't get much diffusion.

Two friends is at least enough to start pushing us fairly far along this.

And you get to three friends and you're most of the way there.

So just in terms of giant component size here we are.

We see that there's a fairly tight transition from having

no giant component to having almost everybody being

connected and so most of the play is in this one region.

And indeed when people study epidemiology there's

this idea that having at least one contact that's going to transmit something

if people are doing more than one you're going to get diffusion or contagions and if

people are having less than one transmission then things are going to die out.

And that makes perfect sense. You need at least, you know,

more than one and it's growing,

fewer than one and it's shrinking.

That's the critical point.

And then here we find that it actually

transitions very rapidly so that by the time you've got

three interactions that are likely to lead to

contagion then our diffusion is quite extensive.

Okay, so given that,

we can also do calculations of what's the probability of being in a giant component.

Well, this was the probability that you weren't in the giant component.

Probability that you are is one minus that,

this is increasing in d,

it's an increasing function of d. M basically that just tells you more connected,

you're more likely to be infected.

And this is exponential M d. So

it tells you that increasing your interaction rate makes it much more likely,

exponentially more likely that you end up in the giant component.

So you're more likely to be infected at any point in time if we do a

dynamic and much more likely to end up infected.

And this gives us some insight behind that Coleman,

Katz, and Menzel finding,

that the people with more interactions we're adopting the antibiotic,

prescribing antibiotic at an earlier time.

So here we see this directly in terms of this model.

Okay, extensions.

Let's talk about some extensions quickly.

Immunity, well, that's easy.

If some nodes are immune then they can't catch something.

They can't transmit it so we can just drop those out of

our model to begin with and then study the giant component on the remaining nodes.

And what's that's going to do is actually decrease the degree, right?

So if we have some fraction of the nodes that disappear then that's going to decrease

the degree and that decreases the overall interaction.

In terms of probabilistic infections, you know,

we just have links fail and we can incorporate that directly into the link probability.

So we can think about, you know, some nodes initially exposed to infection.

We could have some fraction pi of the nodes that are

immune and we can think

also that maybe only some of the links are actually going to transmit.

Let's let a fraction f of the links transmit and then we ask what's the extent of

the infection and basically we can just start with our network on end nodes,

we delete some fraction of the nodes,

delete some fraction of the links and we look at what's left.

And if we get an infection in the giant component on what's left then we'll get

to reach whatever the giant component is

on what's left and otherwise we don't get an infection.

So it's a very simple extension to start to incorporate these things and, you know,

if we let q be the fraction of knowledge remaining on

the network and the remaining network in the giant component,

then q times 1-P is the probability of having a non-trivial contagion and that's also,

sort of, the extent of the infection.

So basically what we can do is just resolve

the equation where now what we've got is we're going to have

a lower expected degree for the individuals plus moving that down to lower given the fact

that they're not going to have as many neighbors because some of those neighbors are

immune and then also scaling down by the f. And

once we've done that we end up with basically the same exact equation

just taking these two modulations into effect.

And so what we end up with is a similar curve as we had before but now what

we have is on the x axis the expected degree.

The one that's, sort of, relevant here is now

modulo the fact that some

of the nodes are going to be missing and some fraction are going to transmit.

And so it has to still be at least one out of what's

remaining in this and then going up to three.So you can just do direct calculations,

this just scales things back.

So implications. The infection can fail now if pi is high enough or

if f or p are low enough so that basically puts you below the one.

So high pi, high immunization, low f,

low contagiousness of the process,

low p, low contact among the population.

So you can think of these different modulations

to what we had before and that's going to affect the eventual process.

Okay, so that takes us through

a quick look at component size and doing these kinds of calculations.

This has been for a process that just spreads through one network.

The next thing we're going to look at is an ongoing process where people

can change back and forth between state one and zero.

So maybe you support somebody, you don't,

you like a technology, you don't like it.

So you can change your mind over time or there are certain kinds

of diseases that people can catch and then re-catch.

And so, you know, these are ones that can last in the population.

So I can clean my computer of a virus,

get the virus back,

it can infect other people and so forth.

So I can go back and forth between the states zero

and one over time and we can look at that kind of model as well.

That will be the next thing we have a look at.