Hello. Today we're going to talk about the concept of distance in social networks.
So the idea here is that sometimes we'd like
to know how far nodes are away from each other.
So for example in this network that you see here how far is node A from node H?
Or we'd like to know, for example,
are some nodes far away from
each other and other nodes close to each other in general in this network?
And if so, which nodes are closest and
which nodes are the furthest away from each other in the network?
To answer all these questions,
we need to develop a concept of distance between nodes.
And that's what we're going to do in this video today.
So to answer this question,
the first concept we need is the concept of a path.
A path is simply a sequence of nodes that are connected by an edge.
So for example, we can find paths that go from the node G to the node C.
Here's the path G-F-C.
And you can find different paths so for example here's the path G-F-E-C.
So how far is node A from node H?
Well, we will look at different paths that go from A to H.
So for example, we can find a path A-B-C-E-H.
And notice that this path it takes four hops to get
from A to H. And there are other paths so for example there is
the path A-B-C-F-E-H which takes five hops.
And so for a path,
we're going to define the path length to be the number
of steps that it contains from the beginning to the end.
So in this case path one has length four and path two has length five.
And to define the distance between two nodes,
we're going to define it to be the length of
the shortest possible path between the two nodes.
So going back to the question of what is the distance between node A to node H,
the answer is four, because the shortest path between
A and H has four hops or has length 4.
In network X, you can use the function shortest_path
to find a distance from any node to any other node.
So here in this example finding the distance between
node A and H in the graph G which is the graph that you see here.
And so here you get the shortest path between A
and H. If you're interested in just the length
of this path then you can use the function shortest_path_length,
and this gives you the length of this path which is four.
Okay, so sometimes what we would like to do
when we have real social networks is to find a distance
from a single node to all the other nodes in the network to figure out how
far away are other nodes from this specific route to node, in this case A.
Let's say we're interested in figuring out what distance
from node A to all the other nodes in the network is.
Well, this is something that you can easily
do manually in the network that's very small like this,
but this could become very tedious to do
manually if you have a very large social network.
And so what you can do is you can program a computer to do this.
And so what we're going to do is we're going to talk about one of
the efficient ways that we
have to compute the distances from a given node to all the other nodes.
And this one is called breadth-first search and what it basically does is that you
start at a node and you start kind of discovering different nodes or different layers,
and at each given layer,
you discover the set of nodes that are one
distance away from the nodes that were in the previous layer.
So, I'm going to walk you through an example of how breadth-first search works.
So here we have the network and we're interested in figuring out
the distance from node A to all the other nodes in the network.
So what we're going to do is we're going to start at A and we're going to start
discovering new nodes as we kind of walk through this network.
And we're going to be writing down all the nodes that we discover.
So we start at A and we sort of process the node A by looking at who is connected to A.
In this case, K and B are connected to A and
so those are going to be a distance one away because
they're the shortest path from each one of those nodes to A it's just one hop, right?
A path of length one.
Okay, so now we're going to process each one of the newly discovered nodes and
ask which nodes are connected to
this newly discovered node that we haven't discovered yet?
And those nodes are going to be assigned to the next layer.
So let's say we process node B.
Node B is connected to K, A and C.
But we've already discovered nodes A and K,
so the only node that we discover here is node C. Now we're going to process node K,
and node K is connected to node A and B,
but we've already discovered both of those.
So the only newly discovered node is node C and it's a distance two away from A.
Now we process node C which is connected to B, F, and E.
And here we've already discovered B so the only two nodes that we discover are
F and E and those are a distance three away from A.
Okay, now we're going to process node E.
Okay, node E has five connections and out of those five,
C and F we already discovered.
So the only new ones are the other three which are D,
I and H. So we assign those to the next layer.
Now we process node F which is connected to three nodes G,
C and E. But the only one we haven't discovered yet
out of all those is G so I want this to get assigned to the next layer.
And all of those nodes are a distance four away from A.
Okay, now we have to process each one of those newly discovered nodes.
And by now you can see that we're already almost done here.
So let's process node D which is only
connected to E. But we've already discovered E so D does not discover any new nodes.
Now let's go with I. I is connected to E and J.
And we haven't discovered J yet,
so this one it's assigned to the next layer.
Next we process H which is only connected to E but we already discovered E. And finally,
we process G which is connected to F which you've already discovered.
So, J is a distance five away. All right?
We have to process J, but J is only connected to
I which we already discovered and now we're done.
We've processed all the nodes.
There are no new nodes to discover.
And so here you can see how we efficiently
figure out the distance between A and all the other nodes in the network,
and this is something that you can
program using a computer to do this in an efficient way.
You can use network X to run the breadth-first search
algorithm by using the function bfs_tree.
And what it does is it gives you the tree that we've built.
And this graph that we built here is called a tree,
and is the tree of the nodes that you discovered.
And so with this function,
you're able to obtain this tree.
So if you look at the edges of
the graph t here and you get that these are the edges of that tree.
So there is A-K and A-B and B-C and so on.
Now if you're interested in
not necessarily the tree in the order in which these nodes were discovered,
but just simply the actual distances between A and all the other nodes,
then you can use shortest_path_length and you give it the graph,
which in this case is G, and the root node,
which in this case is A.
And you get a dictionary of all the distances from the node to all the other nodes.
So, A is a distance zero from itself,
a distance one from B,
a distance two from C and so on.
Okay, so we've defined distance between any two nodes in a network,
but if we go back to the original questions we had, in the beginning,
we were interested in, we're characterizing the distances between
all pairs of nodes in the graph.
Our nodes in general close to each other,
far away from each other.
If some are close and some are far,
then how can we figure out, which are close and which are far and so on?
And so, now, we're going to try to answer these questions.
So, the first thing you can do, is you can simply take
the average distance between every pair of nodes in the network.
In network X, you can do that by using the function average shortest path length.
In G, and in this case,
this graph has an average path length of 2.53.
So, that means, that on average any pair of nodes in
this graph are a distance 2.53 from each other.
So, that tells you what sort of,
on average what happens.
Now, what is the maximum possible distance between any two nodes,
that sort of like, what are the two nodes that are furthest away from each other?
How long is that? How far away from each other are they?
This is called the diameter and is
simply the maximum distance between any two pair of nodes.
And in network X, you can use the function diameter to get it,
and in this case the diameter of the graph is
five and you can see that by looking at the distance from K to J,
which has a length of five.
The other thing that is useful to define is the eccentricity of a node.
The eccentricity of a node is the largest distance
between the node and all the other nodes in the network.
So, you take a node, measure the distance from the node to all the other nodes,
and figure out which one of those instances is the largest one of all.
In network X, you can use the function eccentricity to get all those distances,
and here, you can see for example,
that A has an eccentricity of five, as we had seen.
It has a distance five to some node,
which in this case that's J.
So, that's pretty large.
It's actually as big as it could get, right?
Because the diameter, the largest possible distance between two nodes was five.
But, if you think of a node like for example,
node E here, which looks like it's closer to all the other nodes,
then you see that it has a eccentricity of three,
which means that no node in this graph is a distance
larger than three from node E.
And so, now, that you have this eccentricities,
the radius of the graph is the minimum eccentricity in the network.
It basically asks which node or not which node,
but what is the maximum distance that a node has from all the other nodes,
that's eccentricity and the radius takes the smallest one of those.
And in network X, you can use the function radius to get the radius of
the network and as you could see from the dictionary of eccentricities,
the radius of this network is three.
The next question we had was,
now that we know how to calculate distances and now that we have
a sense for what is a large distance in the network, like, the diameter,
what is the short distance, like the radius,
we can try to identify which nodes are sort of far away
from all the other nodes and which nodes are close to all the other nodes.
So, for the first one, we have the periphery,
which is the set of nodes in a graph that have
an eccentricity equals to the diameter of the network,
which is sort of the largest eccentricity that you could have,
since the diameter is the largest distance between any two nodes in the network.
So, if you apply this definition to this graph,
you find that the periphery of this graph are the nodes A, K, and J.
You can get these set of nodes using networks,
network X function periphery.
And as you would sort of imagine these nodes tend to be
sort of on the outskirts of the network far away from all the other nodes.
The opposite concept is the concept of nodes that are central.
So, the center of the graph is a set of nodes that have eccentricity equal to the radius.
And when you check which nodes are in the center in
this graph using the center function network X, you find that C, F,
and E are nodes that are in the center of the graph,
and this is sort of like,
it make sense there in the towards the center of the graph,
they tend to be close to most of the nodes.
Okay, so with these tools, now,
you're able to take a look at a network,
a real social network,
and start to ask questions about how far are these nodes from each other?
Which nodes are central?
Which nodes are not and so on?
So, let's run an example in this using the Karate Club Network,
which we had seen in a previous video.
So, this is a network of friendship in a Karate Club.
And as you may remember, the story here,
is that node one is the instructor of the Karate Club,
and this node 34 is an assistant,
and they have some type of dispute,
they're not friends with each other,
and so the club actually splits into two groups,
and sort of, this is the separation of the two groups.
So one, this set of students on the left go with one of
the instructors or with the assistant and the other ones go with the original instructor.
So, if we take this network and apply
the definitions about distances that we just covered,
we can discover how far nodes are from each other and who's central and who's not.
So, let's begin by loading up this network.
This network is so famous that actually on network X you can
simply load it by using the function karate club graph.
So, that one returns this particular graph.
Now, I'm converting the nodes labels to be integers,
so that they match the figure I have here.
So, that's what I'm doing with that command there,
and then I could ask different questions about the network.
So, in this case,
the average shortest path between the nodes is about 2.41.
The radius of the network is three,
and the diameter is five.
So, meaning there's a pair of nodes here,
there are distance far away
from each other and that's the largest that a distance can be.
And then, we're going to ask who's at the center of this network?
So, here the nodes are in the center,
and here, I'm highlighting them in red.
So, as you can see the instructor is in the center and
all the other nodes are in the center also connected to the instructor,
and they also tend to have high degrees,
so they are easily connected to many other high degree nodes,
and they just have small distances from them to all the other nodes in the network.
Now, when you look at the periphery,
these are the peripheral nodes,
and I'm highlighting them here and in blue and as you can see,
they're kind of on the outside,
they tend to have small number of connections,
and none of them are actually connected to the instructor.
Now, you might look at this and say,
okay, this make sense.
But, for example, this node 34,
was the assistant here,
he seems pretty central.
He's connected to a bunch of nodes,
it seems like he could be close to all the other nodes in the graph as well.
Why is 34 not showing up in the center?
Well, it turns out that if you look carefully,
node 34 has a distance four to node 17, right?
To get from 34 to 17, you have to go 34, 32, 16, and 17, and so,
it couldn't be in the center because the radius of the graph is
three and this one has a node that is the distance four away from it.
Now, it turns out that actually if this node 17 was just a bit closer,
for example, if this node 17 was a distance three away from 34,
then 34 would actually be in the center,
because 34 is a distance at most three to every other node in the network.
And so, this shows that this definition of center
is pretty sensitive to just one node that happens to be far away, right?
So, if you make a very small change to a graph that
makes a particular node far further away from the other,
you can sort of make it or break it for the node to be in the center or not.
And in the future, we'll look at other measures of
centrality for nodes that are less sensitive to small changes like that.
So, to summarize, we've looked at a set of measures.
The first one is the distance between two nodes and that was defined to be the length of
the shortest path between the two nodes and the eccentricity of a node,
which is defined to be the largest distance between
the node and all the other nodes in the network.
Both of those definitions take place at the either node level
or at the pair of nodes level.
We've also looked at definitions that sort of take place at the whole graph level,
they try to characterize the distances in the whole network
and they were the average distance between any two pair of nodes.
The diameter which is the largest distance between any two pair of
nodes and then the radius which was the smallest eccentricity in the graph.
And then, we use those definitions to identify the periphery and the center of the graph,
which is the set of nodes,
the periphery is the sets of nodes with eccentricity equal to
the diameter and the center is a set of nodes with eccentricity equal to the radius.
And that's all for this video and I hope to see you in the next one.