In this lesson, we'll go through the basic ideas behind correlation, which will be important throughout much of the concepts we use later on in the module and in the course. A quick Google search will give you hundreds of reasons why correlation does not imply causation. I'll say it again because it's so important, correlation does not imply causation. What this means is that two variables that are correlated do not necessarily have a causal relationship. It doesn't have to be the case that one variable causes the other. For instance, a ridiculous example of correlation that proves that causation isn't always the case, is per capita consumption of mozzarella cheese with civil engineering doctorates awarded. It's been shown that from the years 2000 and 2010, these variables had a positive correlation. This doesn't mean however, that one necessarily caused the other. Or so one would think. When we talk about correlation, we'll go into more types, but the main ones to focus on are obviously positive and negative. A positive correlation is one with a positive regression slope. You can see the y and the x here. An increase in x is causing an increase in y. It's pretty obvious from the data. A negative correlation, on the other hand, has a negative regression slope. Again, it's fairly obvious to notice, although sometimes you'll need more advanced tools which we'll talk about in Python to generate these trend lines and actually figuring out the type of correlation involved as well as the slope. As mentioned, a higher value in x is correlated with a higher y value. An easy example is height and weight. The taller you are, the more you tend to weigh and vice versa. When looking at negative correlation, we can say x and y are inversely related. You can think about temperature and altitude. Typically, the higher you go up, temperature tends to drop. Again, this isn't always the case, which is why it's correlation and not causation. Positive and negative correlation don't really tell the whole story. When we think about correlation, there are two parts, the positive and negative part obviously, but also strong and weak. Let's look at some examples. This plot shows a strong positive correlation. The r value here is not the slope, but actually how strong the correlation is. What this is saying is that the points are really, really close to that trend line. A weak positive correlation, on the other hand, looks like this at the middle plot. Well, there does seem to be a correlation. It is slightly weaker, and the points aren't as close together as the trend line. Again, notice the r value goes down to 0.395. It's a good point to point out the range of r values. The r value which measures correlation, ranges from 1 to negative 1. One being strong positive on the upper left, zero being no correlation, and negative one being strong negative on the bottom left. Looking at no correlation now, expect an r value close to zero, which is what we can observe. There doesn't seem to be any relationships between the x and y variables in this plot. A strong negative correlation looks like this on the bottom left. Again, x and y have an inverse relationship and the r value is very close to negative one, indicating that the points are very close to this trend line. A weak negative relationship has a smaller and magnitude r value, in this case negative 0.2. We can see the points are more spread out from that trend line. Let's talk about some of the math behind correlation. Correlation consists of covariance divided by the standard deviations of x and y. If you're not too familiar with these mathematical concepts, don't worry, we'll go through some pictures to make the ideas little bit more comprehensible. These are some of the definitions in the formula. Covariance is defined as the difference between the actual value and the mean and then taking expectation. When you think about the covariance between two variables, you want to take a point in space, subtract out the center, and look at how that compares to the overall distribution by dividing by the standard deviations. Couple important facts about correlation. Correlation of two variables of themselves is zero and the symmetry property holds that correlation of x and y is equal to correlation of y and x. An important thing to know is the independence leads uncorrelated variables. If two variables are independent, they have to be uncorrelated. The reverse, on the other hand, does not hold. Just because two variables are uncorrelated does not mean they're independent. You can think about this in the graph. Looking at the correlation between the two variables gives an r value very close to zero. There is seemingly no correlation. However, they're clearly not independent. This is a plot of points for the function y equals x squared. So in this case, y is completely determined by x, although there's no correlation. Another thing to note here is that correlation is very good at picking up at linear trends, but slightly weaker when looking at exponential or other trends involving powers. Subject to keep in mind when working with time series in the following lessons.