In section 10.2, we introduce the differential entropy for continuous random variables. The differential entropy h(X) of a continuous random variable X with pdf f(x) is defined as minus integral f(x) log f(x)dx over the support of X, and this can be written as minus expectation of log of f of the random variable X. Note that for differential entropy, we use a small h instead of a capital H. Here are some remarks. Differential entropy, unlike discrete entropy, is not a measure of the average amount of information contained in a continuous random variable. This is so despite the fact that differential entropy and discrete entropy have very similar forms. [BLANK_AUDIO] A continuous random variable generally contains an infinite amount of information. These are illustrated in the next example. Let X be uniformly distributed on the interval [0,1), then we can write X equals .X_1, X_2, X_3, so on and so forth. That is, the dyadic expansion of X, where X_1, X_2, and X_3 etc., is a sequence of fair bits, that is, uniform i.i.d. bits. Then the entropy of X is equal to the entropy of X_1, X_2, X_3, so on and so forth. Because these bits are i.i.d., the joint entropy is equal to the sum of H(X_i), i equals one up to infinity, which is equal to summation i equals 1 up to infinity, the constant value 1 and this is equal to infinity. That is, the amount of information contained in the random variable X is infinite. Now we discuss the relation between differential entropy and discrete entropy. Consider a continuous random variable X with a continuous pdf f(x). Define a discrete random variable X hat delta, which takes the value i, if the random variable X is in the interval i delta, i plus one delta. This is illustrated in the figure below. Basically, we divide the x axis into intervals of length delta. The discrete random variable X hat delta is equal to i if the random variable X takes a value in the interval i delta comma i plus 1 delta. In other words, X hat delta is a quantization of the continuation random variable X with resolution delta. Since f(x) is continuous, p_i, which is the probability that X hat delta is equal to i, is approximately equal to f(x_i), times delta, where x_i is any value in the interval i delta, i plus 1 delta. Then for small delta, the discrete entropy of the random variable X hat delta, which is equal to minus summation i p_i log p_i, can be approximated as follows. As we have seen, p_i can be approximated as f(x_i) delta, and the same for the other p_i. Now log f(x_i) times delta can be written as log f(x_i) plus log delta. In the next step, we write the summation into two, the first being summation i f(x_i) delta log f(x_i), the second being summation i f(x_i) delta log delta. In the next line, we rearranged the terms in the first summation and for the second summation we move log delta outside the summation. Then for a small delta, the first summation can be approximated by the integral f(x) log f(x)dx. And the second summation can be approximated by the integral f(x)dx. Now, the first integral together with the minus sign is the differential entropy of the continuous random variable X. Since the second integral, that is integrating f(x)dx over all x is equal to 1, we obtain log delta times 1, that is log delta. Therefore, we have proved that for small delta, the entropy of the quantization of X, that is, X hat delta is approximately equal to the differential entropy of X minus log delta. Note that as delta tends to zero, the entropy of the quantization tends to infinity. In the next two examples, we evaluate the differential entropy for some specific distributions. In example 10.12, let X be uniformly distributed on the interval zero a. Then the differential entropy of X, which is equal to minus integrating 1 over a, log 1 over a dx from zero to a, is equal to log a. Note that the differential entropy of X is negative if a is less than one, so the differential entropy cannot possibly be a measure of information. In the next example, we evaluate the differential entropy of a Gaussian distribution. Let X be the Gaussian distribution with mean zero and variance sigma square. Then, the differential entropy of X is equal to 1 half log 2 pi e sigma square. To evaluate the differential entropy of a Gaussian random variable, we first let e be the base of the logarithm. Now, consider h(X) equals minus integrating f(x) log f(x)dx, where the log is the natural logarithm. Now for log f(x), consider f(x) equals 1 over square root 2 pi sigma square e to the power minus x square divided by 2 sigma square. Then log f(x) is equal to minus log square root 2 pi sigma square minus x square divided by 2 sigma square. Then, we substitute this expression for log f(x) and obtain 1 over 2 sigma square, integrating x square f(x)dx plus log square root 2 pi sigma square integrating f(x)dx. Now, the first integral is just a second moment of x. For the second term in the summation, log square root 2 pi sigma square, is equal to 1 half log 2 pi sigma square. And integrating f(x)dx is equal to 1. Now expectation of X square is equal to variance of X plus the square of the expectation of X. The variance of X is equal to sigma square. And the expectation of X is equal to 0. Therefore, this sigma square cancels with this sigma square. And we obtain 1 half plus 1 half log 2 pi sigma square. To combine the two terms, we write 1 half as 1 half log e, where log e is equal to 1. Upon combining the two logarithms, we obtain 1 half log 2 pi e sigma square, where the units is nats. By changing the base of the logarithm to any chosen positive value, we obtain the differential entropy of X is equal to 1 half log 2 pi e sigma square. We now discuss two basic properties of differential entropy. [BLANK_AUDIO] The first property is called translation, which says that the differential entropy of X plus c is equal to the differential entropy of X. That is, by adding a constant to a real random variable, the differential entropy does not change. The second property is called scaling. For a not equal to 0, the differential entropy of a times X is equal to the differential entropy of X plus log of the absolute value of a. Here are some remarks on the scaling property. The differential entropy is increased by log of the absolute value of a if the absolute value of a is greater than one, is decreased by minus log of the absolute value of a if the absolute value of a is less than one, and remains unchanged if a is equal to minus 1. We will see later in the chapter, that the differential entropy is related to the spread of the pdf. Roughly speaking, the more spread out the pdf is, the larger the differential entropy is. We first prove the translation property. Let Y equals X plus c. Then the density function of Y, f_Y(y) is equal to f_X(y-c). And the support of Y is equal to the set x plus c, such that x is in the support of X. For the differential entropy of X, with the change of variable x equals y minus c, f_X(x) becomes f_X(y-c), dx becomes dy, and the support of X becomes the support of Y. Now, f_X(y-c) is equal to f_Y(y). Then, the minus of this integral is equal to the differential entropy of Y, that is, the differential entropy of X plus c. Next, we prove the scaling property. Let Y equals a times X. Then the density function of Y, f_Y(y), is equal to 1 over the absolute value of a, times f_X of y over a. And the support of Y is equal to the set ax such that x is in the support of X. For the differential entropy of X, with the change of variable, x equals y over a, f_X(x) becomes f_X(y/a), dx becomes dy over the absolute value of a, and the support of X becomes the support of Y. Now, we move this absolute value of a to the front, for this f_X(y/a) inside the logarithm, we multiply this by 1 over absolute value of a, and we make this up by adding log of the absolute value of a. Now, 1 over absolute value of a, f_X(y/a) is equal to f_Y(y). For this log of absolute value of a, we move it outside the integral which is highlighted in blue, where the integral is seen to be integrating f_Y(y) dy, over the support of Y. Now the first term is equal to the differential entropy of Y. In the second term, the integral f_Y(Y) dy, over the support of Y, is equal to 1. Furthermore, Y is equal to a times X. And therefore, the differential entropy of X is equal to the differential entropy of a times X minus log of the absolute value of a. Hence, entropy of a times X, is equal to entropy of X plus log of the absolute value of a.