From all the possible probability distributions, once stands out because its the distribution that is encountered very frequently. It's called, very appropriately, the normal distribution function. In this video, I'll explain its most important properties. The normal probability distribution is also called the Gaussian distribution. It is symmetric, bell-shaped and characterized by it's mean mu and standard deviation sigma. The highest point of the distribution is located at the mean. And its width is specified by the standard deviation. Both mu and sigma are called parameters of the distribution. The cumulative normal probability distribution has a sigmoidal shape, where the mean is given at the probability value of 0.5 and the sigma determines the steepness of the curve. The shorthand for stating that the random variable X has a normal distribution with parameters mu and sigma is this. And this is the full equation describing the probability density of such a variable. It is a magnificent equation not because it may seem rather complex at first sight. And, as far as I am concerned, also not because it contains three important mathematical constants, pi, e and the square root of 2. But it is special because the equation connects the statistical realm to the material world. The equation describes how particles distribute themselves by a process called diffusion. If you release a diffusion compound, for instance sugar, in a cup of tea, then the concentration of the sugar will be distributed according to this equation. And this applies not just to fluids, but also for instance, to particles in the atmosphere, traffic in the street, and information in society. At the same time, the Gaussian distribution is encountered frequently because it is the distribution that you get if the effects or outcomes of independent random processes are combined according to the central limit theorem. However, let's not get carried away. I'll try to explain the equation by taking it apart. So this equation gives the probability density of a random variable X. The function is a kind of exponential function with a constant in front and a part in the exponent which contains small x, the value that the random variable may take. As you see in this part of the equation, the mean is subtracted from x and is divided by sigma. This is in fact the calculation of the z score. So the values of the random variable are standardized before they enter the rest of the equation. Now let's focus on the constant in front of e. The exponential function without a constant has a surface under the curve that is changing with the value of sigma. But when multiplied with a constant it has a value of exactly one. The value of the constant is in fact the height at the top of the curve. This is the x equal to mu. A somewhat counterintuitive property of the normal probability density is that it approaches zero for very large positive or negative values of x, but will never actually be zero. This leads to the fact that the values a random variable can take will stretch from minus infinity to plus infinity. All these values are possible outcomes albeit very small probability, but still the sum of all probabilities will be one. To finalize, let's go back to the two parameters mu an sigma which determine the location and shape of the normal curve completely. Here you see the probability distribution of time spent traveling from home to work on a week day for men in Western Europe. On average the travel time is thirty minutes with a standard deviation of six minutes. And this is the curve for women in the same countries with a smaller mean but a larger standard deviation. What you see is that the peak gets lower if the curve gets wider. Another property of the curve is that the values and the units at the y-axis change if you change the unit along the x-axis. For example, if you'd expressed the time in hours instead of minutes the probability density values change from a probability per minute to a probability per hour, hence these increase 60 fault. Let me summarize what I've explained in this video. The normal or Gaussian probability density function is a symmetric, bell-shaped curve and its corresponding cumulative function has a sigmoidal shape. The location and shape are fully described by two parameters, the mean and standard deviation. The mean determines the center of the curve, the standard deviation determines its width. The wider the curve, the lower its peak by necessity, because the surface under the curve always equals 1. This is the short annotation to state that a variable x is normally distributed with a mean of 63 and a standard deviation of 12. And this is the equation of the normal distribution
in which you can identify the value
of the random variable x, and the two parameters, mean and standard deviation. The equation is not only describing a probability distribution, it's also describing the outcome of many processes in the material world, where some form of diffusion is important.