So hi. Again I'm Chris Fargo and I'm here with our last conceptual Network Analysis lecture. And I want to start with the general question of, how do we determine where roads go on a network? We've talked a little bit about sensuality and we've said yeah, the most central people should probably be the most central in the network and I think that's a good intuition. But there are formal procedures in making these layouts happen. They're called Layout Techniques. There are many different ways in which you can organize a layout of a network graph. Layout algorithms are used to determine the position of nodes and edges in a graph. The most common layout algorithms are force directed layouts, which is physical forces to arrange nodes and edges. These algorithms are often used for visualizing networks with many connections with many nodes. So be a good choice for you when you're using the big data that we're about to use. The force directed algorithm, usually uses a spring embedding algorithm because it tends to use springs to connect nodes together. FORCEATLAS 2" is a force directed algorithm that's used to drop graphs. It's based on the idea that using forces between nodes to draw them together can make a graph more evenly distributed. OPENORD is a generalization observation that only uses a a portion of all possible ties to visualize the network. Its idea is that for very dense networks not all major ties are visualized and very few minor ties are visualized. By removing ties from the network, we can actually see what's most important. And you can see here the tacobell is clearly the most important node in this network. And given that this is heart mentions of tacobell, that is the correct answer. The YIFAN HU PROPORTIONAL network algorithm, is a common network layout algorithm that is used to help us make sense of messy data. It's based on the concept of proportional connection between nodes. The algorithm first determines the node centrality, something we've talked about and then maps the nodes in a way that minimizes the sum of distances between two nodes to its closest neighbor while maximizing their similarity. If I had to pick one way to graph a network or to lay out a network, I would go with this model because it really makes sense to me how they operationalize centrality. Each one of these methods actually has academic papers that you can read and you can have your eyes plead [LAUGH] with all the different math that they use to come up with these. A lot of visualization and network analysis comes down to sampling because it's impossible to compute all of the possible distances between all of the possible nodes. And that sampling methodology can be quite complex, but all of these layouts are typically available in most data viz, network analysis tools and we're going to use a few of them in python. So just like you can correlate to variables, you can correlate to networks to separate network graphs can be correlated using the quadratic assignment problem. The quadratic assignment problem is a mathematical optimization method that finds the best assignment of nodes to clusters while minimizing the sum of squares for individual nodes. Error here is operationalized as two different arrangements for two different networks. There's an error between two networks when their arrangements or their connections are not similar. A note there is the difference between the nodes actual and desired cluster assignment. Actual and desired really are kind of arbitrary here take network a being actual network be being desired. Correlate them, see where they differ, that's the error. The stronger the correlation, the more the two networks are related. So we have done a lot of theorizing up until this point, but we actually haven't looked at what network data looks like under the hood. So let's take a look at some of the various formats that are available out there and let's talk about how they differ. There are many different types of network analysis data structures and they all have really cool names like edgelists. Adjacency matrices, adjacency lists, adjacency matrix crafts, but there's no one right type of data for all network analysis applications, although if you record a network accurately, you can always transform your data from one type to another. The choice of data type depends on what you're going to do with the data or what the statistical or python package that you're going to use, excepts. And so we're go use edgelists in this class because they're the simplest to conceptualize and all network analysis except them. Adjacency matrices are quite common in network analysis applications because they can be used to represent many different types of networks, directed or undirected, weighted or unweighted, valued or unvalued by the way, weighted or unweighted, valued or undervalued, same thing. Moreover, they have the same number of properties that make them particularly useful for different types of analyses. Edgelist like I said or what we're going to go with this semester and we'll get into those in a minute. But first, I think it's important to visualize an adjacency matrix. I think if I were to think about one way to visualize the network, this makes the most sense. Think about each row and column and the ID associated with the row and column as an actor. So actor0, our node 0 here is represented in the first row, in the first column, actor 1 is in the second row & second column. And you can see here that when we go to the intersection of an individual user so 0-0 or 1-1 we all see 0s. And that's because most networks don't allow self ties. I can't be friends with myself on twitter, I've tried it doesn't work. So we don't want to record that relationship, right? Because it's not really possible. Adjacency matrices are symmetrical labels correspond to the IDs and nodes. Rows and columns correspond to IDs and nodes. It's worth repeating. It's a symmetrical table where IDs and nodes each get a column in a row. So, I could be known number 5. If I was, I would be friends with no number 3, but not friends with node 0, 1 and 2. It's very easy and intuitive to prepare network data like this by hand, but it's hard to conceptualize code that generates this. So I tend to shy away from this in my computational wor. .Weighted adjacency matrices are exactly the same but they allow for values to be stored in their relationships. So if I take the above example and we assume that the word Queso is associated with 1 and Burrito is associated with 2 and Steak is associated with 3. There are 15 articles in our data that mentioned both Qeuso and Burrito and two articles mentioned Queso and Steak. Adjacency lists are simple and efficient ways to soar graphs. They are lists of lists where each inner list represents all of the connections that a node has for a graph. For instance, role one here corresponds to the first person in the network, we could say Person 0. If they're friends with person 1, Person 6 and Person 8, then you'll see row 0 printed out just like you see it now. Person 1 is friends with the Person 6, and Person 8 is friends with Person 3. Edge List will be working with in this course. It's a simple list of all connections in the network. It's not sorted by user or ID, but instead it describes every single edge in the network one at a time, where it starts and where it lands. By default edge lists are ordered. Whoever comes first on the list is sending the edge and whoever is coming 2nd receives the edge. You can turn this off by just saying, hey this edge list isn't ordered. This is just random who comes first and here comes second. But by default, they're typically valued. Think of it as painting a network one edge or one line at a time until you have the whole network. It's kind of tedious, but you can actually get to a total network with doing nothing else but just explaining the relationships. Edge lists can be valued as well. Just repeat the edge that you want as many times as the value calls. If Node 7 was connected to Node 9 four times we repeat that last list 7, 9 four times in our data. Most network analysis tends to be performed by academics, most academics tend to use the R statistical package. This means that there's a little more out there for network analysis in R and since I'm not really studying R or using R in my day to day, I don't really have a lot to tell you. Other that igraph, tidygraph and ggraph are often used in the papers that I review. The most popular network analysis package for R igraph, it's a very powerful package that can do everything from simple network analysis to really complex stuff with exponential random graph models. However, network analysis tools and python are getting better all the time and I'm so excited to show you what we're going to be able to do this semester. There's a lot of other free tools that you might want to use. There's a couple that cost some money too, gephi is a wonderful network visualization tool, that's a standalone application. It can be used to create networks and graphs. It's a beautiful data viz package that gives you, these really clean networks like this one right here. Now, this is a java application. It runs technically on any OS, but Buoy have we had trouble getting students to be able to get it to install on their computer. It's due to Java, needs a specific version of java. Sometimes java on a computer can be messed up. It can be frustrating, or heck to install. But I recommend you take the time to do it. We're not going to make you do it for this class because we can't support the install of it on everyone's computer. But I highly recommend you give it a try see if you can get it to work because it is a beautiful data program and it is totally free. If you can't get gephi to work polinode is a great second. It's really good at visualizing networks. It does cost money once the networks are of a certain size, so keep that in mind. You see iNet also cost money. It was the first network analysis tool used in the study of the internet. It was developed by people at Berkeley. It's affordable. It's fairly old school. When you open it up it looks like Windows 95. Before I was [LAUGH] able to code in python I would use it to generate network statistics and transform network data formats from one to another like an edge list to an adjacency matrix and so on. I still use gephi nowadays because it's still the best data viz tool that I have access to and I really think it's probably the best in the field. But I don't really use UCINet anymore because it is now really been replaced by packages in python and most statistics that I could get out of that program, I can get from python for free. So now you know everything there is to know about network analysis of course, that's not true. But you do know a lot. So what I want you to do, is go visit the project that I have put out there on coursera for you. We're going to be working through that project just as we have in the other two courses. I want you to get familiar with the challenge so that when you get into that python code, you're going to know exactly what we're trying to do. I'm so excited to show you network analysis in person and python. Thanks for listening to these conceptual lectures all throughout the sequence. And let's get going one more time in Google Colab.