Hi, in this set of lectures we are talking about networks. In particular, we are focusing on three things, we're talking about the logic of networks, which is how they form, going to talk about the structure of networks, which is how connected are nodes from one another, how far is it from one another to another node and finally we're going to talk about the functions of network. What functionality does a network have? Maybe one built-in, like sort of emerged from the structure. Now in this lecture, we're gonna talk about the structure of the network. So we're gonna define a bunch of terms, find where the network is in terms of nodes and edges and we'll just define and describe a bunch of measures in network that help us. For a better understanding of how networks differ from one another. So when you think of a network, you can think of a bunch of things that we're gonna call nodes, which are points, and then lines connecting those, and those are gonna be edges. So a network is really just a set of nodes and a set of edges. Now those edges can be either directed, and that means there's like a little arrow, or they can be undirected where it's just a straight line. So an undirected node. The network would be something like. [sound]. So an undirected network would be something like the 50 United States. So each node would be a state, and then an edge would just be whether those two states share a boundary with a common border. So there's no direction there. If Ohio's connected to Michigan, then Michigan is connected to Ohio. In a directed network, then that little arrow that you see, that means that. One node points to another node, but it might not necessarily go in the other direction. So if you take a college campus, and you look, you ask whether one student looks to another student for fashion advice. It may very well be that person A looks to person B, but person B doesn't look to A. So in a directed network, you could have an arrow going in one direction, or you could have arrows going in both directions. So a lot of social networks are directed. But many networks, the networks were gonna focus on are gonna be undirected networks, just because they're easier to work with. So then you've got structure we're gonna focus on four measures: degree, which is how many edges each node has on edges, path length, which is how far it is from each node to another node. Connectedness, which is whether the entire graph is connected to itself. And then, finally, clustering coefficient, which will give us some understanding of, like, how tightly clustered are the, are the edges? So if A is connected to B, and B is connected to C, is A connected to C? Okay, so let's get started. Degree of the easiest one. If you take a node, the degree is just the number of edges connected to that node. So if a node looks like this, its degree is one. If a node looks like this, it's degree is four. So it's just how many, if you're a social network, it's how many friends do you have? If you're a business, and you're looking at, and each edge represents working [inaudible]. Another firm and it's a case sort of how many other firms do you work with, so degree is just really a sense of how connected you are to other parts of the node. The degree of a network is the average degree across all nodes. So if I take the entire network and I add, count up the degree of every single node and than average by the number of nodes, that's what we'll think of as the degree of the network. So a high degree network will be one where lots of people are connected to lots of other people. A low degree network will be one where there's very few connections between people. So here's a network, right here. And if I look at this green node, I can ask what is it's degree. And you see that it's connected to one, two, three people. So the degree of the green node. Is equal to three. Now, if I wanna figure out the degree of the entire graph you can [inaudible] I'm gonna go through and count up each person. So, this person has a degree of one. This person has a degree of one. This person has a degree of three. And I can go through and count all the way up. Now, that seems like that takes a lot of time. There should be a better way to. Figure out the average degree for a network, and in fact there is. Cause all I have to do is count the nodes. So, in this case, there's one, two, three, four, five, six, seven, eight, nine, ten nodes, and then I can just count the number of edges. There's one, two, three, four, five, six, seven, eight, nine edges. So, ten nodes, nine edges. Now each edge connects two nodes. So, what I've gotta do is just multiply those edges by two. So if I have two, two, two times the number of edges divided by the number of nodes, that'll give me my average degree. So, in this case two times nine is eighteen, divided by ten gives me an average degree of. 1.8, now when you think of moving along those edges, you can call the people you're connected to, your neighbors. So if this is a node here, and he's connected to these three people, then these three people here are the neighbors of the node. And what we can think about then is. People who are connected have more neighbors, and later on, we're gonna extend this concept to neighbors from sort of your immediate neighbors, to neighbors who are further away on the graph. Well, here's an interesting result. So we only introduced one measure, which is degree. And now we're gonna get a really, sort of fascinating result. And that is, the average degree of neighbors of nodes will be at least as large as the average degree of the network. So, typically, it'll be larger, but it has to be at least as large. What does this mean in less technical terms? It mean, most people's friends are more popular than they are. So literally, it means that. But if you look at your friends, they, on average, have more friends than you have. But what does C have [inaudible]. So let's look at this network. A. Has a degree of two. B has a degree of three. C has a degree of two. And D has a degree of one. So if we add all that up, we get eight, divided by four gives us two. Now we also could have done this by just counting up the edges which are four, multiplying that by two, and dividing by the number of nodes, and that would also give us two. So what we get is we get on average, people have two friends. Well now we can ask what about friends of friends? Well, let's start with D. D has one friend. And that's B, and B has three friends. Well let's, let's walk through the whole thing and see how this works. Let's start, again we start with D, they have one friend, which is B, who has three friends. C has two friends, A and B. A has two friends, B has three friends, that's an average of 2.5. If we look at B, B has three friends. A, which has two, C, which has two, D has one, that's 1.66. And if we look at A, A has two friends, B and C and they also have an average of 2.5. So, if I add all this up, what I end up getting is that this is five, eight, 9.66. Divided by four, which is, you know, approximately 2.4. So, people's friends of friends have 2.4 friends, whereas people on them, by, on themselves only have two friends. What you see is your friends have more friends than you do. That's just a fact about networks. Next measurement we want to get to. Path length. The path lengths from A to B is just the minimal number of edges you have to traverse to get from A, person A to person B, from node A to node B. So, I've gotta node here and a node here, but there's no direct path. Get to those two, you have to take one, two edges, and their path length is two. So, in this particular graph, here's this green node. To get to this green node, I've gotta take one, two paths, two, two edges. So the path length is two. We can also then compute the average path length for an entire graph. So now you take all pairs of nodes, and figure out the path length. So let's take this graph. Let's think of, there's, what are there? There's A and B, there's A to C. A to D, B to C. B to D, and C to D. Those are the different we could get. So, A and B have a distance of one, A and C have a distance of one, A and D have a distance of one too. B and C have a distance of one. B and D have a distance of one. And C and D have a distance of two. So when I add all this up, I get one, two, four, five, six, eight, and there's six total. So that means the average path length is one and one-third. That's a very short average path length. In that particular graph, it was connected, which is our next measure. A connected is just a graph where you can get from any node to any other by walking along edges. So this would be, again, a picture of a connected graph, 'cause you can get from any node to any other node just by walking across the edges. There's also disconnected graphs. So here's a, a graph of, where there's six nodes and you can't get from this set of nodes to this set of nodes. So it's disconnected. So in, in a graph like this, it's interesting because if it's not connected, any information that's over here with A, D, B, and E, might not get over to C and F. So disconnected nodes, information's not gonna flow between them, which could be a bad thing. Now if you think about disease, this could be a good thing because if there's a new disease in this population, it's not gonna cross this barrier and get over to this population. So depending on what you're concerned with, connectedness could be good or bad. Here's an interesting assignment. We've talked about mark-off processes a couple times. We can think of a mark-off process like a network. In fact our pictures sort of look like networks. So you can think of there being here's four states of a mark-off process and then I can just draw little arrows showing the probability of movement from one state to another. Now this is a directed graph, and you could have, you could be that it's close as you can get from B to A with property 0.3 and we can just throw that in. So, one way to represent a mark-off process is with a network because you can just write down the states and then draw little directed arrows showing the probability from moving from one. One state to the next. So again, you start seeing all of our models get related to one another in interesting ways. Last measure, clustering coefficient. This is gonna be the percentage of triples of nodes that have edges between all three nodes. So if I think of three nodes A, B, and C, it could be that they're connected like this, and then we don't have the full triple. Or it could be, that A is connected to B. B's connected to C and C's connected to A. You get the full triple so we can fill in the triangle. You can ask, what percentage in the possible triangles can you fill in? Let's go back and think of a simple graph. So here's a graph, and you can ask, as I start drawing lines, what happens to the clustering coefficient? So if I draw these three lines, I can ask, how many triangles could there possibly be? And there could be four. There could be this triangle one, this triangle two, this triangle three, and this triangle four. But what we see when we look at this graph is that none of those triangles are filled in. So this is gonna have a clustering coefficient of zero. What about this graph? Same number of edges but now we've got one triangle to fill in so this is going to have a clustering coefficient of one over four. What about this graph? I've got one triangle here, and one triangle here, so this is going to have a clustering coefficient of two over four. And then finally, if I have a completely connected graph, I've got, this triangle's filled in, this triangle's filled in, this triangle's filled in, this triangle's filled in, so I have four over four, which is a clustering coefficient of one. So, if you. Think about that, alright? You can ask all sorts of really interesting questions. So here's a simple quiz question you might ask. Draw two Connected graphs each that has six nodes and six edges, but have this different cluster coefficients. Now the reason we ask an exercise like this is to get across the idea that, number of edges sort of tell you how connected the graph is but it doesn't tell you how clustered it is. So it's possible to have, two different graphs with the same numbers of nodes same number of edges but different clustering coefficients. So here is graph one, and if we look at it there's only one triangle filled in. So this clustering coefficient is gonna be one. Over the number of triangles. Well how many triangles are that we get. One, two, three, four, six nodes and we ask a triangle consists of three. That means there's six nodes that we can pick first, five we can pick second, four we can pick third, but the order doesn't matter. Three times two times one doesn't matter. So that means we get twenty triangles. So this [inaudible] one over twenty. Well here's another graph with six nodes and six edges but none of the triangles [inaudible]. So this is going to have a coefficient of zero over twenty. So two graphs. Same number of nodes, same number of edges, different clustering coefficient. So what have we learned? We've learned that there's structure to graphs. We could measure degree which is on average how many nodes is another node connected to. We could talk about path length, which is how far is it from one node to another node. We could talk about connectedness, is the whole graph connected. And we can talk about clustering coefficient, which is how many triangles, of the possible triangles, how many of those are filled in. Now these are statistical measures. So let's talk about, why would we care about these? Why might these be important? Well, think about degree, what is it telling you? It's telling you the density of connections, in a way. It's telling you just how connected is this graph? So you can think of this as some sort of proxy, possibly, for social capital. So if you looked at a community, and there were very few connections between people, then not many people know other people. So remember we talked about Bob Solo saying social capital can't be just this abstract thing. We need some measures. One measure could be, how connected are people to one another? What's the average degree of the social network? You can also think about speed of diffusion. If there's a piece of information up there, how fast is it going to spread in this network? Well, one thing that'll determine that is how connected people are. How many nodes people are connected to. So, somebody's connected to 100 people and you tell them something and they tell all 100, it's gonna spread really fast. If I'm only connected to two people and you tell me something, then that information is gonna spread fairly slowly. What about path length? Why do we care about path length graph? Well think of an airline graph. It's gonna tell you the number of flights your gonna need on average. So if you wanted to fly from Miami to L.A. And the average path length was 1.1 for the airline network graph, you'd know that you?re probably gonna get a direct flight. If the average path length was seven then you'd know oh my goodness this is gonna take me a lot of flights to get from point a to point b. It also tells you social distance. So suppose you run an organization, and you've got a network. And you know that the average path link between employees is six. That means randomly picking two employees, it's gonna take six other employees, or five other employees to get from employee A to employee B. That means that information knowledge, ideas, are not gonna spread very fast within your organization. And you may wanna try and figure out ways to reduce that path link. And last, the likelihood of information spreading. If the path length is really long, it's just not that likely that information is gonna make it nine steps, ten steps, eleven steps, if people are that disconnected. What about connectedness, our next thing? Well, connectedness for Markov processes, it matters. [inaudible] Markov process, we had to be able to get from any state to any other state. If the nodes are the states, and it's not connected, then we can't apply the Markov convergence theorem, because that was one of our assumptions. You can get from any state to any other. If you think about, the abilities of a terrorist group. You might still think all these people are really upset, with, the world in some way. And, and that sort of a terrorist organization is people who wanna do, curiosum tasks that [inaudible] we don't want them to carry out. If their network is not connected. Then it's going to be difficult for them to carry out these activities. If it is connected, then they can pass information and can possibly carry things out. So things like path-link degree in connectedness are going to determine the functionalities of organizations that do good as well as organizations that do bad. Now, also connecting [inaudible] measures if you think about things like the internet or power failures, the electric grid. If the electric [inaudible] becomes disconnected, you disconnect from the grid and you don't have any power. And finally, living in terms of information, if people are disconnected from other people there gonna be isolated in terms of information. So they may not learn things, they may not figure things out. And then finally the clustering coefficient. Class stream coefficient. Can be used to figure out how redundant or how robust a network is. So if you pulled one person out. So think about it, if you've got a little triangle here between A, B, and C. If A gets wiped out for some reason, B and C are still connected. So, it gives you a sense of how robust or how redundant the network is. This is also sometimes used as a measure of social capital, cause it means there's these, these tight links between people. And one last really interesting thing about The custom co-efficient can be used to capture how likely an innovation is to be adopted, in particular possibly a new word. So if I've got three people that are all connected, A, B and C. And A uses some new word. And B hears it and then tells it to C. And then C uses the same new word, and A hears it. You get this sort of autocatalytic set where the information get passed through the loop, and it now seems part of the new normal. A says, well, I said this, now I heard C say it, so this must be something that people say. Whereas, if A wasn't connected to C, when A said something new, it could just sort of fan out, and it may never come back to A. And so A may be less likely to maintain the innovation. So these clusters [inaudible] creating feedbacks can allow things to be maintained within a network. One last thing. [laugh] And I've used all these measures there's still a sense that networks that a picture is worth a thousand words. And one reason I think for the rise of networks so much is because of the fact that we now can graph these networks in really interesting ways. So let me show you the power of pictures. Here are three graphs. The first one is the 50 US states, and there's an edge, those are the nodes. And there's an edge if they're adjacent. The second one is a group of 48 universities. And then it shows, there's a, an edge between them if they played football against each other, [inaudible] American universities. And then the last one is an airline, so each node is an airport. And [inaudible], a network edge is just gonna be whether the airline flies from that one network to another one. Here are the statistics on those three graphs. So graph one has a degree of 10.8, a path length of four. Graph two has a degree of nine, a path length of six. Graph three has a degree of ten and a pass length of four. So they're all about the same. So look at these statistics. There's not a huge difference between the three networks. Let me show you the pictures. Here's graph one. What do you think this is? Well, this is clearly an airline network. There's a couple hubs here, and there's all these cities that the airline flies into. Here's graph two, what is this? Well, this is the 50 United States. Up here are Alaska and Hawaii, who are disconnected. And it looks, in fact, a lot like the United States. And then here is graph three. These are the football conferences, one, two, and three. And what you can see is the teams all playing each other within conference, with an occasional game outside the conference. So what we see is we see very different structures between the airline network, the states. And the University's playing football, even though the statistics, the first two statistics, degree and path link were about the same. Now, where we would see a difference statistically is in clustering coefficient. So, if we look at this graph, the clustering coefficient is going to be very low, in fact it's going to be zero almost. There's very few triangles in this graph. In graph two, there's a little bit of a clustering coefficient, but in graph three the clustering coefficient is really high. So the sense in which, maybe instead of saying a picture's worth a thousand words, we might want to say a picture's worth at least three statistics, or maybe four statistics. Cuz you need quite a few statistics. Statistics before you can really start distinguishing between graphs. So degree won't be enough, path link won't be enough, but if you throw in clustering coefficient, then maybe you start making more sense of the graph. Okay, so that's structure. We now understand, we can think of networks as having nodes, edges, degrees, path length, and connected in a [inaudible] cluster coefficient. So this is a language for describing different network structures. What we're gonna do next, is start with the logic which these networks form. Alright? Thanks.