So now we need to actually quantify the similarity, right? because we need to be able to use it somehow. And so we're going to look a little bit at how we do that. this is going to be the most math that you will see in the course, the most in this slide, and the next slide, the most math you'll see. And we'll never show this much math again so just bear with us. Okay, so now suppose you took each of these movies, movies one and movies two, and we plotted them. Right, so we plotted the ratings for the movies for different users and then we have one user. A, coming along this way, and we have another user, B, coming along this way. Right, so, if user A liked movie one, we would give it a positive rating which means that we're going to extend positive this way, didn't like it, we would give a negative rating, 'kay. Similarly, if user B liked movie one, we would go up positive and if he didn't like movie two, we'd go negative. And then we define these different points one for each movie one and two, so we could have movie three, four, five and so on and then each of the dimensions in the graph are a different user, so its a given idea, okay. So now these values here have, are some indication of the rating that the user gave the values, okay. And we'll talk exactly what it is in a second. but the idea here is, we want to compare these two lines and see how close they are to one another. Okay? Because if they're really close, like in this case right here, that we've drawn, they are really close. Right? So, user A liked movie one so he rated it positive. User A also liked movie two, so he rated that positive. User B liked movie one, so he rated that positive. And user B also liked movie two, so he rated that granted more positive. But it is the same. The vectors for movies one, and movies two are both positive wherein they're both close to one another. So when we look at the angle between the vectors, this angle right here is small because they're pointing in the same direction. What that means is that the users tend to respond the same to those movies because they're pointing in the same direction. Similarly if we had the angles like this. It would be the same idea. What all this is saying is that one user responded negatively to both movies but, still the same, idea applies that if a user responds negatively to one movie it's going to respond negatively to the other. And so on. It's just saying the movies are similar in taste . Now, here's another example. Okay we're just moving now. We're keeping movie one over here and we're moving movie two over here. 'Kay. So now User A had a positive response to movie one, negative response to movie two. User B had a positive response to movie one, and also a positive response to movie two. Now this angle's getting a little larger here, and you can see. So now, there is not any same directionality here because the, the users responded differently, okay. So, user one liked this movie, didn't like this movie, user two likes this movie and likes this movie. So, for user two there seemed to have been a positive correlation on his view taste, but for user one seem to have been negative correlation. So, right here there is really no correlation at all. Here we have the same thing, it's just now rather than moving User Two's taste, we're going to move User One's taste. So User B, not User Two, User One. User B liked this movie, didn't like this movie. User A liked both movies. Okay, so you see the idea again is that the users are each responding differently. So we can't find a correlation among movie tastes, okay. We can't say that if one user likes this movie he's not going to like this movie or if one user likes this movie he's going to like this movie because they don't go in a similar direction. Now here's the other extreme. Okay is that in each case now user A likes this movie doesn't like this movie. User B likes this movie doesn't like this movie because they're pointing opposite directions. Now this is a very dissimilar situation so when one user likes one movie he tends to not to like the other movie this angle is larger and it's getting closer to 180 degrees. Okay. So this is a very dissimilar situation. And this is a very similar situation. This is positive correlation. Strong positive correlation, this is strong negative correlation. This is really no correlation, or none, we'll write. So we want the angle to either be close to zero degrees indicating if there's strong positive correlation, meaning that when one user tends to like one movie he will like the other one. Or we want it negative, which means there's a strong negative correlation, which means that when one user likes one movie, he will not like the other. Or if he doesn't like one movie then he'll tend to like the other. Now the way that we quantify this, okay, is by taking the cosine of this angle in here. And so, we don't have to explain geometrically how you get the cosine, we'll just illustrate it intuitively here. The way that we get the cosine of that angle, okay, the cosine is going to be close to plus 1 if the angle is close to 0 degrees. It's going to be closer to zero if the angle is around 90 degrees or like in these situations right here, it would have zero, like zero correlation. And it's going to be close to minus one if the angle's getting close to 180 degrees. Okay. Like in this situation right here. This would be close to minus 1, this is close to plus 1. So now, the way that we calculate the cosine similarity, okay, is by basically multiplying a user's preferences for each of the movies together and adding those up. Okay. So basically what we would take is we would take A1 times A11 would be 2. Okay. Add B1 times B11. And then we actually divide by the length. So we divide by the length of each of these segments, okay. And you don't really have to. Know that part as much. It's not as important but we need to normalize the value, to between zero and one. Like it is here, or sorry, to be between minus 1 and plus 1, like it is right here. So we want it within this range. And so we divide by the length of the lines basically to get that to be the case. So we divide by the square root of A1 squared plus B1 squared. Right, 'cause remember the movies form the lines. [SOUND] And then sorry, times the square root of A2 squared plus B2 squared. And if we had more users, you'd, you would just add more terms, right. You just A1 plus A2. A1 times A2 plus B1 times B2. Plus C1 times C2, and so on. That was just be A1, B1, C1 here. A2, B2, C2 over here. So it's just a simple extension, getting more users. But now the intuition is as follows. Okay, if these are pointing in opposite directions. So if this one is positive, this one is negative, suppose. Then this product is going to be negative. And same thing here, if this one's positive, and this one's negative, then this product is going to be negative. Right, so now this is going to be a very negative sum, because we're adding two negative numbers. So it's going to get closer to this minus 1 down here. If these are both positive, on the other hand, then it's going to be very positive, so we're going to go up here. If one of them is positive and one of them is negative then we're going to be getting closer to zero because there is not as much correlation. Right, so both of these are positive, if all four of these, for instance, are positive, then we're good. Even if all four of them are negative, then we'd be good, because negative times negative makes a positive. Negative times negative over here also makes them positive, and so on. We just don't want one term to be positive and one term to be negative. 'Cuz then in that case, there's no correlation. Right? Like up here, we had, we would have this one term being positive and the other term being negative. So now correlation, as we said, it could be near plus one which means it's strong and positive, near zero which means there's no correlation or near minus one which means it's strong and it's negative. So the key idea that you should really take away here is how to find. This cosine value, and what it means, right. If it's close to plus one it means they're, strongly positive correlated. If it's close to negative one it means it's strongly negative correlated. And if it's close to zero it's means there's really no correlation.