So, Google searches output, and what you see is really a composite of the relevance and importance score. And they both do play a part in the output that you see, and the order in there of. But the importance score specifically determines the top results. That determines the first page or so of results that you see, so the, the idea is that for the first ten or twenty results, they all have to clearly be very relevant, so all have to contain keywords. And the question is amongst those that contain keywords, how do we figure out which of them are the most most important. Now, Google's ranking algorithm, we should point out, is called PageRank. And it's a clever pun on Larry Page, was one of the founders of Google, so his last name there. And so an interesting Cornell study in 2008, just to show you why we really only need to look at the first pages, so that's why we focus on the important score. Here in this lecture is they, they, they searched typical nouns and they tried to see how many, they did a study on the number of clicks that each search results gets on average. So what fraction of the clicks, if you're someone searching how many, how many, times are you going to click on the first one? How many times are you going to click on the second on? And so on, so in all, if we took the proportion of people who click on the first link, the proportion of people who click on the second link, and so on, how does that decrease? And so the first link gets something like 56% of all the clicks, which is really remarkable. And it drops off pretty quickly after that, so we have 56%, 13%, 10%, that's up to almost 80% just from the first three results. So, the first three results right here, they saw see almost 80% of all of the clicks. And the first page itself, is something like, 98%. So, if we took the first page, that was about 98% and I believe I have those statistics here as well. So, the first three links receive about 80% and the entire first page actually receives greater than 98%. And so, these first few links in the entire page are really determined, so they're ordering amongst themselves. Once you have those relevant pages, those are determined by the importance score itself. And that's really what's going to play an impact here, so we will be focusing on the importance score in this analysis. So now, let's move to understanding how the importance score is Is used to rank these pages. So, the first thing we have to take a look at, is the idea of a Webgraph, and what that is. And we're going to see, how this can give the connectivity overview of the webpages and how they're interconnected with one another. So, let's start with some terminology. A Webpage, we've been using that term, but it's really, it's a document on the web. So it's a document, and you can go there, and you can access the page, and do whatever you want to do on that page. A Hyperlink, is a connection from a webpage to a piece of data. So, on the webpages themselves, you see some external links here, that you can click on, and when you click on them, that will bring you somewhere else. Now typically, it will bring you to another webpage, so for our purposes here and what we're seeing, a hyperlink is a connection from a webpage to another webpage. So, we have webpages, and then we have Hyperlinks connecting Webpages. All right. So now, before we talk about a webgraph, and you could probably already understand what a webgraph is. It's really just the Webpages and how Hyperlinks connect them together and we just draw that visually. But, so, a graph is a huge concept mathematically and we, we will not go into much detail on what graph theory is, but simple terminology here will suffice. And when we say graph route, we're referring to some like XY chart plot. That's, that's not what we're referring to something, a structure, a mathematical structure, that has nodes which are also known as bird seeds, and those represented typically as circles. So, we have these nodes, and we have link or edges, so there's, those are kind of interchangeable, it's either vertices and edges or notes and links. And the nodes or circles and the links connect the nodes together. So, we may have links like this. So, this is node A, this is node B, and this is node C, that I have a link between A and B, I have a link between B and C, and so on. I, you know, I may have a link also between A and C but maybe not, and it will depend upon how the graph is connected. And so now, we can take this. And well, we can apply that to a Webgraph, which is just a graph of Webpages, so in this sense, now, the nodes themselves become webpages, right. So, we have webpages as our nodes, which we will still draw as circles but webpages become our nodes. So, suppose we have just three webpages that we're looking at here, and the links become the hyperlinks, so the hyperlinks are connected to them. So now, one thing that we have to make note of is that, here we drew the links without any arrows or anything indicating a direction. Now, Graphs can have directed links, in which case, it's really only implying a one-way connectivity. So, A may connect to B, but may not connect to A, and C may connect to A, but A may not connect to C, and C may connect to B, for instance, B may not connect to C. So, when we're dealing with a graph of the Internet or a graph of the webpages, these links are directed. Because hyperlinks only go one way, they don't, just because A points to B doesn't mean B is going to have a hyperlink back to A. So, if this is page A, maybe page A has a hyperlink to page B, and then page C has a hyperlink to page B and so on, something like that. So, a simple example of that would be, if you are going, if you have a website and your favorite artist also has a website, right. So this, maybe this is your artist's website. This is your favorite artist right here, and this is you, and this is your website right here. Okay, so you may have a link on your page that goes to your artist's website, and there's a lot of other people that probably have links that go to this website as well. But it doesn't mean that he is going to take a link and put it back to your page for every single person that puts a link to his page. So it's not, they're not bidirectional, and otherwise, there would be a lot more people that were famous, because they would just feed off of that, be able to gain a lot of fame. That obviously wouldn't work too well. But, so now, directed, the way that we indicate a directed graph is by having the links have arrows. And as we said, a webgraph does have direct links. Another thing about a webgraph is that it's very sparse. By sparse, I mean that most webpages aren't connected to one another directly. So for every web page, a web page has a very few number of neighbors, which we call them, so If we look at a node A here, A B is A's neighbor because A is connect to B, and so on. And you can get to him in just one path jump. the number of neighbor's in each node has is way smaller than the total number of nodes. So I mean, we have something like 60 billion nodes in the internet, right. So there's like 60 billion webpages, but the Wikipedia page, which is on the higher end of the number of hyperlinks you see, may have like a hundred links out to different pages. So 100 versus 60 billion is much, much smaller, so by sparse, we mean that there's not a lot of lengths compared to the number of nodes, and the internet itself, as we said, has 60 billion nodes. So that, that gives use kind of the issue here, because there's no way we could ever draw the graph of the Internet and then do like a calculation on that. Because that would just take us, take us, you know, our entire lifetime and we'd still never finish. So, we're going to have to use examples here, that are of much smaller scale with four even five nodes maybe, and look at those. But really the ideas of the same, it's just on a much larger scale scales. The ideas are portable and they scale up very well. And Google has a lot of algorithms in place which scale up to very high data sets and very large data sets.