0:00

I'm going to take you through

a threshold theorem for those people who are interested in seeing

a little more detail exactly how these thresholds

work and how some of these things are proven.

Just to give you a feel for the type of proof that works in this setting.

So the one that we're looking at is going to be one of

the more interesting theorems coming out of Erdos and Renyi.

So what we're looking at is that a threshold function for

the connectedness of one of these Poisson or GNP random networks,

is the point at which the probability of forming a link is proportional to log(n)/n.

And so if this then tells us that if P(n) is much larger than

this then-- so if P(n) is much

larger than log(n)/n then we're going to have a connected network.

And if it's much smaller than this then the network is not going to be connected and

we're going to have different components in it.

Okay. So that's the theorem,

and what we're going to do is just sort of work through part of

the proof which basically carries the main intuitions of the proof.

And there's still some details we have to

work out and I'm not going to go through all the details,

but this will give the essential bits of the proof.

And so one thing we can do is show that if P(n) compared to this threshold goes to zero.

So if P is much smaller than this threshold,

then there's going to be isolated nodes with probability one.

So if there's isolated nodes then certainly the network can't be connected.

Whereas if P(n) compared to t(n) goes to infinity,

then there can't be any components with less than half the nodes.

So all components have to have more than half the nodes

and basically that means there has to be one giant component.

So the idea here is going to be to show

that you know if you much smaller than this you going to get some isolated nodes.

And if you're much bigger than this you're not even going to get any isolated components.

And what I'll do is really concentrate on the fact that showing that there

are isolated nodes or there aren't isolated nodes is at this threshold.

And the idea that showing that a component-- a small component can exist or

not exist is more or less just a small variation on this proof.

First of all, just to

remind you of some useful approximations of the exponential function.

So in case your calculus or your pre-calculus is a little bit rusty.

So definitions of the exponential function two forms

of that will be very useful in this course in terms of understanding properties.

It's going to be one is that if we look at E of Z x we can also think of that as just

looking at the limits of a process where we

do one plus x over n raised to the n-th power.

So think of this as continuous compounding of interest looks like E to the X.

So the limit as n becomes quite large of one plus X to the n over n equals the exponent.

Another is a Taylor series approximation for exponential function.

We can write the exponential function as the sum of x to the n over n factorial.

So that's another way of getting that.

And you can go to your Taylor series approximation if you like.

But that will be a useful approximation and here what we're going to

do is is approximate probabilities of

these events happening and then using these approximations we'll be able to get

things back in terms of exponentials and then log, log with functions.

So what we're going to show now is

that right at the point where your expected number of connections looks like a log(n).

And so this is going to be the point where p is proportional to log(n)/n.

So the expected degree is just multiplying up by n

minus one of roughly n. This is the threshold

above which we expect each node to have some links

and below which we expect nodes to be isolated.

And in fact, this is the threshold above which we expect nodes to have many links and

once each node has many links the chance of having disconnected components vanishes.

So basically, the logic is we're going to show that

this part of it and the rest is a fairly simple variation. Okay.

So first thing we're going to do is we're going to write the expected degree as

some function r+ log(n) so the expected degree is actually some

r+log(n) and what we're going

to then do is now through calculation of the probability that some node is isolated.

Okay. So how is that going to depend on this expected degree?

So the probability that some link is not present.

So I look at a given node and ask what's the probability that it's isolated?

Well, it's the probability that it has no links.

What's the probability that it doesn't have one of its links?

Well, that's one minus P. What's the probability that it has none of its links?

Well, that's the probability that none of them formed.

So one minus p times one minus p times one minus p times raise of the n minus one power.

So this is the probability that some node is isolated.

And so what we want to do is then show that the probability if p is small

enough then you're going to have a high probability of these nodes being isolated.

And if p is large enough then you're not going to have any probability of this happening.

Okay. So the probability of the isolate is one minus P to the n minus one power.

Okay. So probability that some node is isolated is this.

And now, from the way we wrote the P,

remember we wrote that the expected degree was

r+log(n) the expected P is just that divided by n minus one.

So this is the P. One minus P to the n minus one

is just one minus r plus log(n) over and minus one raise to the n minus one.Okay.

So now, we've got an expression for this.

And now we're going to use our approximation that we went through before I

just told you the approximation for an exponential function.

So recall that if we have some x running n off to

infinity one minus x to the n over n to the n approximates E to the minus x.

Okay. So, and that's true of X over n is small, so vanishing.

And in particular, what we're going to need is

r+log(n) to be small relative to n minus one.

Now, what I'm going to do is just assume that that's going to be true.

And the completion of the proof here for people who are really interested.

If r+log(n) is large relative to n then this is a monotone property,

if it holds when its small relative n,

it's even going to hold more easily when it's large.

We're just adding more links.

So that part of the proof is quite easy.

So the case that's really going to be tough is when it's small relative to

n. And so what we're going to do now is look at this.

So now we've gone an approximation we're approximating this by e to the minus.

Right? So we get e to the minus r plus log of n as an approximation for this.

So the probability that some node is isolated begins to look like this,

right, and now we've approximated that using our exponential function.

So it looks like e of the minus r minus log n,

bring the log n out and we end up with the probability that

some node is isolated looks like e to the minus r

over n. But remember r was how much expected degree different from log n. Okay.

So now we've got a simple formula for the probability that some node is isolated

for large n. So if each one

is isolated with a probability e to the minus r over n then

the expected number of isolated nodes is

just n times this since nodes are identical in this world.

And so the n cancel out and we get

the the expected number of isolated nodes is e to the minus r.

Now, we're pretty much all the way there.

If r is going to infinity,

if r is getting quite large then the expected number of isolated nodes goes to zero.

If r goes to minus infinity,

then that tells us that the expected number of isolated nodes becomes infinite.

So basically what we're getting here is as a function of r if it's

either getting very largely positive meaning that the threshold is

exceeded by P or if it's getting very negative so that P is much smaller than

the threshold then we're either seeing the expected number of

isolated nodes going to zero or going to infinity.

Okay. And so basically the expected number of nodes goes to-- isolated nodes goes to

zero if r(n) tends to infinity and to infinity if r(n) tends to minus infinity.

So the expected number of isolated nodes tends to

zero if that's tending to zero then the probability of having one has to tend to zero.

You can't have the expected number be zero and have

the probability of having one be positive.

So you know, significantly higher than zero.

So the probability of having one is going to have to tend to zero.

And so this tells us now that basically if we're sitting in a situation

where r(n) goes to minus infinity then-- Oh, sorry.

If r(n) goes to plus infinity then the expected number tends to zero,

the probability of having one tends to zero.

So basically if p is much bigger than this t(n) being log(n) over

n then we've got a situation where

the expected number of isolated nodes goes to

zero and the probability of having an isolated nodes goes to zero.

Vice versa if it's going to infinity then an extra step you play with the variance.

You can't get the expected number going to infinity without

having the probability of having at least one go to one.

So it's impossible to have the expected number go to infinity and not

have a probability of going to one of actually having one.

So that's an extra step that I'm not going to go through but it's

a fairly easy one to do in terms of Chebyshev and equality.

So this was just one example.

I don't expect you to digest all of this on the first pass you can go through.

You can go through Barabasi's book in much more detail if you want to

see one of these proofs worked out in full detail.

But what I wanted to communicate to you through

this derivation is just the type of reasoning it goes through.

So what you do is you want a certain probability.

You can get that probability in terms of approximations based

on exponential functions often that approximation of n. Now,

we know for large n it becomes fairly accurate and that gives us bounds on probabilities.

So we could say that the probability of something is either

has to be tending to one or has to be tending to zero

based on what these functions have to be converging to as n becomes very large.

And those threshold functions come from

identifying where is it that these these expressions either go to one or to zero.

Whether we're talking about

isolated nodes or are we talking about the first link appearing etc.

It's going to be a different threshold.

And so the proofs have tend to have the same kind of structure to them.

They're just going to differ in a particular expressions depending on whether we're

talking about one type of threshold and

one property or a different property in a different threshold.

But this type of analysis is what underlies a lot of

random graph theory and a lot of the proofs behind the scenes

in terms of establishing properties of random graphs in large settings.

Okay. So that's just one example.

Now we're going to take a look at some other types of

random network models and other kinds of

things that we might be able to establish and using them.