Okay, so we're back and we're going to talk a little bit more about computing the size of a giant component in a random graph and in particular, we'll focus in on poisson or GMP random graph, [inaudible] random graphs. And, so we're doing giant component calculation and we're going to do sort of a heuristic calculation but one that turns out to be fairly accurate. And the idea here is we'll just do the following kind of calculation. Let's let q be the fraction of nodes in the largest component. Now, let's look at a given node and if we just pick a node at random, the chance it's in there is q. But another way to figure out that chance that a given node is in the giant component has to be that all of its neighbors are outside of the giant component if it's outside the giant component or if it has any neighbor inside the giant component it'll be inside the giant component. And so that'll give us an ability to do a calculation. So the probability that a given node is outside the giant component, so it's not in the giant component, is 1-q since q is the probability that it's in there. But that's also equal to the probability that all of its neighbors are outside. So if the node has degree d, if it has d neighbors, then the chance that each one of those is outside of the giant component is (1-q) we raised that to the dth power and we end up with (1-q)^d is the chance that it's outside the giant component. And so what we can do is just look at the expected degree, take an expectation over this based on the probability of it having different degrees and then find an equation based on that. So we have this q, we're going to try and solve for q. So the probability that you're outside the giant component that's 1-q, it's also the probability that all of your neighbors are outside the giant component. You have d neighbors. What's the probability that you have d neighbors? Well, it's given by a degree distribution. So p(d) here is our degree distribution, so this is the chance that you have d neighbors and once we've got a degree distribution then we can solve for q. Now this is a calculation you could do. It's a heuristic calculation because I'm not worrying about correlations and some of these variables. I'm treating this as if they're independent. They're not quite independent. You can do a more exact calculation using generating functions. There's some of that in the book but this will give us a fairly good approximation and work fairly well especially on large networks. So what we want to do is solve for q. Now, this is an equation so far that doesn't require us to work with a specific degree distribution. You could plug in any degree distribution you wanted into this and then solve for what q is. It's a non linear equation so it's going to be a little difficult to solve for a closed form for q, but you can plug in whatever degree distribution you want and then solve that for q or get an approximate solution for q as a function of your degree distribution. So this is a nicely well-defined equation. And basically, when we look at this there's always going to be a solution of q equals zero. So there is one solution to this equation which has q equals zero. Right? So we're basically solving this equation. One possibility is, well, if we put in q equals zero here and q equals zero here then we're just summing across all degrees for the probability of d, we get one on both sides. That's always going to be a solution. The interesting solution to this is going to be the one where we have a non-zero q. So we want to solve to a non-zero q. So one possibility is we assume everybody is outside the giant component and then everybody ends up outside the giant component because all their neighbors are. That's the nonsensical solution in a lot of situations so the one we want to solve for is going to be the one where there's a positive solution to this equation. So if we look at this and try and solve for the positive solution it's going to be the point where 1-q is equal to the ∑(1-q)^d P(d). And so we're trying to solve for what's, what 1-q is solve this simultaneously, that gives us a solution to q. That's what we need. You could solve this for different degree distributions. Let's do it for the case of GNP or approximately a poisson random graph. So we'll plug in a poisson distribution for PFD. So we have the poisson distribution here. So this is the probability that a given node that has degree d is given by this formula. And when we plug that in then we end up with this equation. So now we can solve this for q and this looks like a daunting equation. It's got sum over factorials of d. So again this is sum over different ds. We've got factorials here. It looks like a fairly nasty equation. But luckily there are some nice parts to this equation. So when you look at things that look like sums of some expression raised to the d over d factorial, that actually has a nice exponential form. So that sum is a nicely behaved sum. And one useful approximation, just to have in the back of your mind, is that when you look at writing down a Taylor series approximation for e^x, that looks like ∑e^x/d!. So Taylor series approximation tells you that e raised to some power of x is the same thing it's suming x to the d over d factorial. So you've got a nice expression here and in particular when we go back we look like we're summing something to the d over d factorials. That's going to look like an exponential. So when we plug in that expression now we say we can rewrite this given our formula as exponents of what's inside here. So we rewrite this in a nice form so that now we've got 1-q is equal to this expression here. Okay? And collecting terms, sum of our P simplify here and we end up with e^(-q(n-1)p). Remember that (n-1)p is the expected degree, right? So P is the probability of any link, multiply that (n-1) times, that's the the expected degree. So what we end up with is here 1-q is equal to e^(-qE(d)) and if we take a log of both sides we get -log(1-q)/q= E(d). So a very nice simple equation. Now we've got a relationship between the q on the one side and the expected degree on the other side. So now we've got an expression that tells us how large the giant component is as a function of the expected degree. It's not quite a solution directly in q because it's -log(1-q)/q but what we can do is at least plot this out and see what it looks like. So if you plot that out as a function of the E(d)s and then see what the qs look like, basically what happens is as q gets close to one there is no giant component, right, so q is very small at one and then the giant component grows and by the time you get up to three as an expected degree, your q is actually very large. It's already close to 95%. So by a degree of three we've already hit close to 95% in terms of the size of the giant component and at one we're close to zero. So this is interesting and this is, you know, somewhat characteristic of these rationally random graphs. We get these Tyche phase transitions. So if people have expected neighbors less than one expected interactions that would transmit a diffusion of less than one, we don't get diffusion. Once you hit one we get a diffusion and then it grows fairly rapidly and by the time you have three interactions then we're already to the point where you're very likely to hit most of the population. So if you remember in our promo video I said tell at least two friends. Well, if you tell just one friend about this then we don't get much diffusion. Two friends is at least enough to start pushing us fairly far along this. And you get to three friends and you're most of the way there. So just in terms of giant component size here we are. We see that there's a fairly tight transition from having no giant component to having almost everybody being connected and so most of the play is in this one region. And indeed when people study epidemiology there's this idea that having at least one contact that's going to transmit something if people are doing more than one you're going to get diffusion or contagions and if people are having less than one transmission then things are going to die out. And that makes perfect sense. You need at least, you know, more than one and it's growing, fewer than one and it's shrinking. That's the critical point. And then here we find that it actually transitions very rapidly so that by the time you've got three interactions that are likely to lead to contagion then our diffusion is quite extensive. Okay, so given that, we can also do calculations of what's the probability of being in a giant component. Well, this was the probability that you weren't in the giant component. Probability that you are is one minus that, this is increasing in d, it's an increasing function of d. M basically that just tells you more connected, you're more likely to be infected. And this is exponential M d. So it tells you that increasing your interaction rate makes it much more likely, exponentially more likely that you end up in the giant component. So you're more likely to be infected at any point in time if we do a dynamic and much more likely to end up infected. And this gives us some insight behind that Coleman, Katz, and Menzel finding, that the people with more interactions we're adopting the antibiotic, prescribing antibiotic at an earlier time. So here we see this directly in terms of this model. Okay, extensions. Let's talk about some extensions quickly. Immunity, well, that's easy. If some nodes are immune then they can't catch something. They can't transmit it so we can just drop those out of our model to begin with and then study the giant component on the remaining nodes. And what's that's going to do is actually decrease the degree, right? So if we have some fraction of the nodes that disappear then that's going to decrease the degree and that decreases the overall interaction. In terms of probabilistic infections, you know, we just have links fail and we can incorporate that directly into the link probability. So we can think about, you know, some nodes initially exposed to infection. We could have some fraction pi of the nodes that are immune and we can think also that maybe only some of the links are actually going to transmit. Let's let a fraction f of the links transmit and then we ask what's the extent of the infection and basically we can just start with our network on end nodes, we delete some fraction of the nodes, delete some fraction of the links and we look at what's left. And if we get an infection in the giant component on what's left then we'll get to reach whatever the giant component is on what's left and otherwise we don't get an infection. So it's a very simple extension to start to incorporate these things and, you know, if we let q be the fraction of knowledge remaining on the network and the remaining network in the giant component, then q times 1-P is the probability of having a non-trivial contagion and that's also, sort of, the extent of the infection. So basically what we can do is just resolve the equation where now what we've got is we're going to have a lower expected degree for the individuals plus moving that down to lower given the fact that they're not going to have as many neighbors because some of those neighbors are immune and then also scaling down by the f. And once we've done that we end up with basically the same exact equation just taking these two modulations into effect. And so what we end up with is a similar curve as we had before but now what we have is on the x axis the expected degree. The one that's, sort of, relevant here is now modulo the fact that some of the nodes are going to be missing and some fraction are going to transmit. And so it has to still be at least one out of what's remaining in this and then going up to three.So you can just do direct calculations, this just scales things back. So implications. The infection can fail now if pi is high enough or if f or p are low enough so that basically puts you below the one. So high pi, high immunization, low f, low contagiousness of the process, low p, low contact among the population. So you can think of these different modulations to what we had before and that's going to affect the eventual process. Okay, so that takes us through a quick look at component size and doing these kinds of calculations. This has been for a process that just spreads through one network. The next thing we're going to look at is an ongoing process where people can change back and forth between state one and zero. So maybe you support somebody, you don't, you like a technology, you don't like it. So you can change your mind over time or there are certain kinds of diseases that people can catch and then re-catch. And so, you know, these are ones that can last in the population. So I can clean my computer of a virus, get the virus back, it can infect other people and so forth. So I can go back and forth between the states zero and one over time and we can look at that kind of model as well. That will be the next thing we have a look at.