Welcome back, everyone. We're going to finish up our video series on summation and use of the summation symbol, and we're going to remind you what mean and variance are. And the main point of this lecture is either remind you or tell you about these concepts for the first time and to tie them back in to sigma notation. It's one of the main uses of the sigma notation. What I'm going to do here is scare you a little bit by showing you the punch on a video at the beginning. But we'll walk through this very, very slowly and unpack it. So the whole point of the video, if you understand this screen you've understood everything. Is if we have a set of numbers X which has N numbers in them, X1 to Xn, they're just real numbers. The mean of that set, the symbol mu sub X, could be expressed as In summation notation this way. So t his is the mean of x. Sort of an average. And this ridiculous set of symbols here, this is called the variance of x. The point of the video is to understand what that is. And finally by the way, that's just plain old sigma. Which is the square root of sigma squared. This is the standard deviation, And we'll walk through all that. Okay, now that we've seen that fancy sigma notation let's work a really simple small example of numbers and then generalize. So suppose we have a set Z which has 3 things in it. Clears 1, 5, and 12, and if you remember the old notation the cardinality of Z is 3 because there are three elements in z. If we take the mean of Z, what that really means is we add up all of the numbers, 1 + 5 + 12, and we divide by how many numbers we have, in this case three. So if we do that, never a good idea to do arithmetic in public, obviously I've done this in advance. So it's 18 over 3 which is 6, that's the mean. And there's lots of notation for it. The most correct general notation might be the Greek symbol mu for mean and we usually put a sub z down here to say it's hey it's that set z we're using. Sometimes you'll be it's mu of z, z argument to a function. Often you'll just see mu, because we know what we're talking about already. Okay, that's a simple example of numbers. Let's do a slightly harder example with symbols instead. Suppose I give you a set y, which consists of four numbers but I don't tell you what they are. y1, y2, y3, and y4. Then the mean of Y, mu sub Y, would be just one-fourth times y1 + y2 + y3 +y4 and aha. Here comes the punchline. Let's express this in sigma notation. This is one-fourth times the sum from i = 1 to 4 of y sub i. Right, and that's really the mean. And remember, i is a dummy index, I could use j, I could use a smiley face, I could use whatever I want. So let's not get too radical, let's use i. Okay, so we generalized, let's generalize a little bit further. In general, suppose we have set x consisting of n arbitrary numbers. They exist, but I'm not going to tell you what they are. x1, x2 up to xn. The mean of x, Is pretty easy to guess this now. Mu of x is equal to 1 over n times the summation from i = 1 to n of x sub i. That's the mean using sigma notation. By the way, it's worth thinking a little bit about the two different philosophical functions of i and n. i is not a real variable, right? i is just a counter. It's says hey, count from 1 up to n. That's why we call it a dummy variable. Hey you dummy, you're not doing anything except taking a counter off from 1 to n. N really is a variable, right? N tells you when to stop. I'm not telling you what n is. If n is 10, you'd stop at 10. If it's 11, you'd stop at 11, and so on, right? In the previous example, we saw n was 4. Okay, before we jump into variance, I want to take a little sideline and tell you about something called mean centering. The reason I'm telling you about this is this is a common trick used throughout a lot of data science techniques. You'll see it later in linear regression, for example. Punchline here is if you have a set of numbers you often want to change the side of numbers so that its mean is centered at 0. There's lots of reasons why. Let me just walk you through the mechanics of what it means to mean center data no pun intended. Here's our friend Z. It has three elements in it, 1, 5, and 12. Previously we computed that the mean of Z, 6. Over here we see three elements of Z in blue, 1, 5, and 12. And there's the mean, 6, there. Let's form a new set, let's call it Z prime. And what I'm going to do is to every element in Z, I'm going to subtract off the mean. So 1-6, 5-6, 12-6. No dear, three examples of arithmetic in public. This is -5, this is -1, and this is 6. So there's Z prime. Note, if I compute the mean of Z prime, I'm going to get 0. This is -5 + -1 + 6 over 3 and that works out to be 0, unsurprisingly. What we are actually doing here, through all the copy of the number line, is we are saying, hey, see that red dot there at 6, I'm going to pretend that's 0. How am I going to pretend it's 0? I'm going to yank it back over to 0, and I'm going to sort of be accountable for what I've done, so I'm going to shift everything over. So in other words if you like, here is 0. 6 is going into 0. That's going to become a red dot. And then everyone comes along for the ride. So 1 has to go all the way to- 5. 5 has to go to- 1. Everyone's getting shifted the same amount. 12's gotta get shifted over here to 6. And that's mean centering. The punchline is whenever you mean-center data, you produce a new data set which sort of has the same relationships, but the mean is 0. There's lots of reasons why you want to do that, which we'll get into later. All right, the last topic of today's video is a statistical concept call variance. I should mention, one of the points of the mean, is when you have a large set of number. In this case, large is 30, but large could be 3 million. Statisticians and data scientists often don't like large amounts of numbers and large amounts of information. So they want to summarize the set by a small set of numbers. So summarizing a set by its mean is about the simplest thing you can do, but it gives some information. What we're going to see here is an example of where it obviously doesn't give complete information. Here is a set Z, which is 1, 5, 12. We're getting bored with this, it's our friend, we get bored with our friends. Mu sub Z is 6, fine. Here's another set W, 5, 6, 7. If you calculate mu sub W, turns out that's also 6, so you can check yourself. So obviously it's not the case, unsurprisingly it's not the case, that the mean is not a unique classifier of a set. We have two sets with the same mean. Okay, let's look at these sets on the number line and see that obviously the mean's not telling the whole story. Here is 0, Here's the mean at 6. Let's draw a Z in blue. So here's 1, here's 5, here's 12. And let's say let's draw W in yellow. So yellow actually has a dot right here on top, 5. A dot right there at 6, and a dot right there at 7. Okay so they have the same meaning, but if you had to say in a word what the difference it between yellow and blue you might say spread out. Okay, that's two words, right? You might say that blue is more spread out than yellow. And there's a statistical mathematical data science concept called variance which basically tells you how spread out data is. Let me just write that down for you in general and then we'll compute it for these examples. So if X is equal to x1 to xn, the variance of x is this, it's got this funny symbol called sigma squared x. It's Greek symbol sigma, we square it, and we put sub x to denote that we really care about x. This is 1 over n. And then this is going to look really intimidating but it's not. We take the sum from i = 1 to n. We look at xi. We ask how far is xi from the mean. xi- mu of x. We're going to square it. Talk about a little bit in a sec on why we want to square it. This is the variance. What this is really doing, let's look at the term inside the square. For every single xi, xi- mu sub x is really the question how far are you from the mean? The reason we square it is we don't really care if you're to the right of the mean or the left of the mean, which care how far are you. So for example, if xi was 1 and the mean was 6, that would be a pretty big number. If xi was 5 and the mean was 6, then it'd not be that particularly big of a number. And then essentially what we're doing here is we're taking the average of those numbers, right, xi-mu sub x squared. We're taking the mean of those numbers which is why we're dividing by n. That's the idea of a variance. Something else you'll often see. If I take sigma of x, which is just the square root of sigma squared, this is called the standard deviation, Of x. Okay, great, so let's calculate that. For these two examples. Before we do it, though, we should sort of home cook it and know what the answer is, right? We know that Z and W have the same mean. Z is more spread out than W. Somehow, I'm trying to sell you on the idea, trying to get you to buy that the variance is a way of quantifying the notion of being spread out. So whatever sigma z is and sigma w is, sigma z better be much bigger than sigma w, otherwise this is junk. Okay, so let's close just with our example. Let's recall z is 1512. W is 5, 6, 7, And the mean of Z is 6 which turns out to be the mean of W because that's how we cooked it. Let's start with the easy one. So sigma squared of w is going to be, in this case n is 3 so 1 over 3. Times the sum from i = 1 to 3 of, let's call this w1, w2, w3. And this over here is z1, z2, z3. The sum of wi minus mean of w squared is equal to one-third. So now the first one, w1 is 5- 6 squared + 6- 6 squared + 7- 6 squared. And if work that out, that turns out to be one third times -1 squared + 0 squared + 1 squared. Which turns out to be two-thirds. Great, and so therefore the standard deviation, sigma of w, is just the square root of two-thirds. All right, if we do another calculation, we do sigma squared z, this turns out to be one-third times 1- 6 squared + 5- 6 squared + 12- 6 squared. Now this is really not arithmetic I'm going to do in public, so do it dot dot dot. Last refuge of something who doesn't want to do arithmetic, and this turned out to be equal to 62 thirds Which I don't really care what that is. Punchline is I think that justifies saying much much greater than two-thirds. Which justifies our intuition that Z and W have the same mean. But Z is much more spread out than W as measured by the variance.