In this video, we will talk about the variation of sample mean and the distribution of the sample mean. Knowing the variation and its rule is important to have us correctly evaluate the estimation, and validate assertions about population based on the samples. For example, you have historical data of 100 days. You can compute sample mean, and variance of stock return. Based on the statistics, can we show inference to the parameters? How close are the statistics to population parameters? By observing the stock data of 100 days, can we make a claim that this stock is in a upward trend? That is, mean for return is positive. All these rely on our understanding of sample mean distribution. Here, we take a sample, when sample size equal to 30, from a population with a normal distribution, mean equal to 10, and a standard deviation equal to 5. If you run this cell several times, you make a different result from there. For example, like this, like this. It is because, the samples are randomly drawn from a normal distribution. Different samples will yield different means and standard deviation. This is called the variation of sample. Furthermore, the sample mean and the standard deviation do not change arbitrarily. It also follows some rules, because they are all taken from the same population. To see that, in this code, we generate 1,000 samples from the same population. We got mean and variance for each sample and saved in a DataFrame collection. Meanlist is a name of a list to same sample means of 1,000 samples. Varlist is a name of list to save sample variances of 1,000 samples. Then, we will generate 1,000 samples in a loop. For each sample, we compute the mean and variance and save them into meanlist and varlist. Finally, we build an empty DataFrame called collection, the same meanlist and varlist in different columns of this DataFrame. We can draw a histogram for the collection of sample means. It looks symmetric and like a normal distribution. The histogram of sample variance is not normal as you can see it is right-skewed. We can guess, in fact, we can mathematically prove that the sample mean has a normal distribution. If population is normal with mean equal to Mu and variance equal to sigma square, then the sample mean is also normal, with mean equal to Mu and variance equal to sigma square, divided by sample size N. Why variance of the sample mean is smaller than variance of a population? Intutionally, the sample mean is the average of N individuals of population, and hence the variation of sample mean is smaller than the variation of individuals in population. Here is a demonstration using Python. Then blue histogram is for population, the red one is for the sample mean. What if the population is not normal? Central limit theory of statistics say, if the sample size is large, the distribution of sample means looks like normal one. Hence, we can conclude this way, even if the population is not normal, the sample is approximately normal if the sample size is large enough. Here's an example about distribution of sample mean when the population is not normal. As you can see here, apop is a DataFrame name, which save the population. In this population, we only have five values, one, zero, one, zero, one. We can generate 100,000 samples with small sample size 10. You can see that in this histogram, for sample means, it does not look like a normal distribution. But, if you generate 100,000 samples with large sample size 2,000, the distribution of sample mean now looks like a normal distribution. In this video, we mainly talked about distribution of sample mean, we described probability rule of variation of sample mean. We will apply these in next two videos, to explore two important quantitative statistical tools, confidence interval, and hypothesis testing.