The final type of analytics we'll discuss in this module is called Centrality Analysis. We call something central if it's in the middle of a larger entity. We also call something central if it's important and vital. For graphs, we'll identify important notes by looking at how central they are. In the graph. Look at the the six node graph, you don't have to know a lot to figure out that the orange node is pretty important, Why? Well, one reason could be it's the most vulnerable node. If you remove it the graph will fall apart. because this is clearly one way to look at the centrality of the node. But it's not the only way. Another way of looking at the orange node, is that it can reach all other nodes quicker, than any other node. So this is the idea behind influences. People in a social network who are connected enough to reach out and possibly influence a lot more people than others will be able to. If the little graph represents a transportation network. And you were asked to build a restaurant somewhere around this network. You'll possibly choose an area near the central node. Because more traffic will flow through this node than other nodes. And some of them will get out at the station and bring business to your restaurant. We can give many more examples. In biological networks, house-keeping genes are genes needed for the maintenance of basic cellular function, and are expressed in all cells of an organism under normal and abnormal conditions, these genes are central because they're vital and they're connected to many other nodes in the biological network. So much so that they need to be taken out of the network so that the rest of the network can be studied. Out of all these examples, we'll focus on a kind of problem researchers have called the key player problem. The problem comes in two flavors. Now take a little time to read the two examples. The first is sort of a negative piece. We have a network and we are trying to find a small subset of people who's removal will maximally disrupt the network. The key operative word here is maximally. So if there is a node which removal breaks the network into two parts, it is not what we want if there's another two nodes whose removal will break the network into ten parts. The second case, however, is a lot more conventional. The goal is to find a small set of nodes with maximal combined reachability to other nodes. That means, taken together, these nodes should reach almost all other nodes. You should know that people use two terms to characterize a general concept of network centrality. The first one, centrality, is about a node. It measures how central the node is with respect to the network. So the orange node in the left graph has high centrality, and the blue nodes have low centrality. The second term, centralization, is about the network. Now look at the graph on the right, where the dark orange node is still very central for the light orange node, and between the other nodes, and therefore has reasonable centrality. As more nodes start having higher centrality, the centralization of the network drops because there is less variation in the centrality values of the network. So for each type of centrality that we discuss next we can compute the network centralization by considering the sum of the difference between the maximum centrality and the centrality of the node divided by the maximum centrality. There are thirty different measures of centrality. We will consider only a few of them and explore the conceptual principles supported here. The first and the most intuitive measure is degree centrality. Quick measures, the degree of a node divided by the possible edges it could have if it connected to each of the other N- 1 nodes in the graph. Now, we have seen this before. This measure gives us a sense of the hubness of the node. The higher the number is, the more hub-like the node is. Thinking of our second key player problem. One way to approach this is to find a hub-like measure for a group with multiple nodes. Instead of measuring the degree centrality of individual nodes, the measure simply counts the number of edges coming into the group as a whole compared to the number of outsiders. In our example two red nodes here together connect to all blue nodes. Although there is just one node neighboring both of them. Closeness centrality takes a different approach to the centrality problem. It acts the shortest distance of a node to all other nodes and divides it by a minus one. So in our graph, a node like I, which is on the periphery of the graph, will be quite far from all nodes in general, and therefore will have a very high closeness centrality value. On the other hand, nodes like F, C and H are much closer to all other nodes. Know that we define this measure in terms of shortest paths. So if a moving object, like information, flows to the shortest path in the network, F is more likely to receive them earlier than other nodes. And therefore can process into other nodes more quickly. Therefore. If we look to inject a new piece of information into the network with the idea that it should read every other node quickly, I should possibly inject it at F. Recall however, that node information flow is two shortest routes. An example is gossip, that tends to travel through centrality nodes. Closeness centrality does not work well for these types of information nodes. Another very popular centrality measure is called betweenness centrality. For any node it measures the fraction of shortest paths flowing through that node. Compared to the number of shortest paths in the graph. Since B is at one end, its between a centrality is 0. But let's look at A. A is in the path from B to E. So is D. Therefore, A's score for the B to E path is 0.5. Similarly, its score for the C to E path is also 0.5, making the total score 1. E is in the path from A to D, for there is one more path from A to do through C, so E's score is 0.5. Now can you verify why C's score is 3.5? Betweenness centrality is typically used for problem where a commodity is flowing through the network. As in the case of closeness centrality. Any quantity that does not flow in shortest path channels like infection or rumor on the internet doesn't work well with betweeness centrality.