In this video, we will discuss frequency, relative frequency of observed outcomes, and introduce the concept of distribution. To recall, this is a dice game from the last video. Sum is a random variable, the print out is the 10 observed outcomes. Totally, we tried 50 times and get 50 realized outcomes of sum. Now, we want to calculate the frequency in this collection of outcomes. There is a very useful method of Data-Frame called value count, it will output pandas series, which has only one column if compared with dataFrame. Its index is a list of different outcomes and the value column, this is the frequency. But even can sort these series according to the index using sort index. For example, in our output, the first row says, the frequency for an outcome two is equal to one. We can plot frequency using bar chart. And frequency will change as the number of trials changes. If we want to compare the frequency of different trials, we have to convert frequency into relative frequency. Relative frequency equal to frequency divided by number of trials. With relative frequency, the shape of bar chart does not change. The scale of Y axis changes. This is a bar chart showing the frequency for the outcomes of 100 trials. As we increase the numbers of trials for example, we start with 100 trials. This is the one with the 400 trials, a 800 here, 1000 trials in this chart, 2000 trials in this chart. The bar chart goes toward a limit. The relative frequency become more and more stable as you increase the number of trials. What could be the limit if we have an infinite number of trials? Distribution of a random variable is a table consists of two sets of values. One for different values of outcome, the other list the probability for each value. Here is the distribution table. We can compute all probability for X using python here. X-distribution is a distribution table we built. From this distribution table, we do not know the shape of outcome immediately. Usually, mean and the variance are two characteristics of the distribution of random variables. Mean of a random variable is also called Expectation. The right side is a general formula for the mean and the variance of a discrete random variable. Xi here, is the only possible outcome. Pi here, is the probability for this outcome. The mean is the average of all outcomes weighted by probabilities. Similarly for variance, which describes the variation of outcome. On the left, it is a Python to compute the mean and the variance given distribution table. What is the distribution for continuous random variables? We'll compute the probability for continuous random variable. We will start with the simplest continuous random variable, which has a uniform distribution. This kind of variables takes possible values with equal chance in certain range. Here's distribution graph for uniform random variable, which takes values on zero and 100 with equal chance. The height of red line is not a probability. It will represent the value of a density function, which is applied to compute a probability for continuous random variable in this way. The area under the density curve is the probability. Hence, the whole area between zero and 100 is equal to one because this random variable can only take a value in this range. To compute the probability for X taking value between 20 and 60, it is equivalent to get area with pink color. Here's a summary. For discrete random variable, you can find probability easily by checking the table of distribution. It is more complicated in continuous case. It is the area under the probability density function, in short, PDF curve. PDF can take any shape. You should be careful to note that PDF is not probability. Now, let's come back to finance questions. Why we need a continuous random variable? Because the distribution of stock data return is continuous. We know that real distribution stock return cannot be directly observed. Not like the dice game. But do we have any good distributions that can describe the the data return reasonably good? In the next video, we will explore this problem and demonstrate how to apply normal random variable, most popular continuous random variable to approximate distribution of stock return.