0:00

Theorem 6.2 is the AEP for strong typicality,

which says that there exists eta greater than

0, such that eta tends to 0 as theta tends to

0 and the following hold. First, if x is strongly

delta-typical, then the probability of x is lower

bounded by 2 to the power minus n times entropy of X plus eta, and is

upper bounded by 2 to the power minus n times entropy of X minus eta.

Second, for n sufficiently large, the probability that a random sequence

X, is strongly delta-typical, is greater than 1 minus delta.

Third, for n

sufficiently large, the size that the strongly delta-typical set, is lower

bounded by 1 minus delta times 2 to the power n times entropy of X minus eta,

and upper bounded by 2 to the power n times entropy of X plus eta.

[BLANK_AUDIO]

Note that the form of the strong AEP, is very similar to the form of the weak AEP.

And the interpretation is also similar.

[BLANK_AUDIO]

We are going to discuss the proof for each part of the strong AEP.

Here we first give the proof idea for the first part.

If x is strongly typical, then the

empirical distribution is quote unquote about right.

[BLANK_AUDIO]

If the empirical distribution is about right, then everything

else, including the empirical entropy, would be about right.

That is, minus 1 over n times log of the probability of the sequence,

is approximately equal to the entropy of X.

And this is equivalent to p(x) approximately

equal to 2 to the power minus n

times entropy of X.

[BLANK_AUDIO]

Let us now prove the first part, that is property one of the strong AEP.

[BLANK_AUDIO]

For any sequence x which is delta-typical,

we have p of the sequence x equals p(x_1)

times p(x_2) all the way to p(x_n). And this

can be written as the product of all x in the support of x,

p(x) to the power N, x semicolon

sequence x.

Here we only need to take a product

over all x in the support, because the number

of occurrences of x in a sequence, is equal to 0, for all x not in the support.

[BLANK_AUDIO]

And therefore we see that, the probability of a

typical sequence is always strictly positive.

Then we take a logarithm of the probability of the sequence.

This is equal to the logarithm of a product

over all x in S_X. And so, it is equal

to summation x, log of p(x) to the power N x semicolon sequence x,

which is equal to N x semicolon sequence x, log of p(x).

Note that we adopt the convention that

summation x, means summation x in the support S_X.

4:25

and then multiply N x semicolon sequence x

minus n times p(x) to log p(x) to obtain minus

n summation x 1 over n, N x semicolon x

sequence, minus p(x) times minus log p(x).

Note that we have divide everything through by

n, and then we multiply everything by n.

And also this minus sign, and this minus sign, cancel with each other.

[BLANK_AUDIO]

Now summation x p(x) log p(x), is just minus the entropy of X,

so we can write, minus n times entropy of X.

And for the other summation, we simply copy it over.

[BLANK_AUDIO]

Since x is delta-typical, summation x to absolute difference between the relative

frequency of x and the probability of x is less than or equal to delta.

[BLANK_AUDIO]

Now consider this summation in equation 1 and take the absolute value.

[BLANK_AUDIO]

By the triangular inequality, this is upper bounded by summation x,

the absolute difference between the relative frequency

of x and the probability of x, multiplied by minus log p(x).

Now, minus log p(x) can be upper bounded by minus

log minimum over all x in the support p(x).

7:11

This is illustrated in equation 1. Now it then

follows from equation 1, that log of p(x) is lower

bounded by minus n times entropy of X plus eta,

where this lower bound is obtained, by taking the

upper bound eta on the summation in equation 1.

On the other

hand, log of p(x) is upper bounded by minus n times the entropy

of X minus eta. And this upper bound, is obtained, by

taking a lower bound minus eta on the summation in equation 1.

So, we have the probability of the sequence x is

lower bounded by 2 to the power minus n times entropy

of X plus eta,

and upper bounded by 2 to the power minus n times entropy of X minus eta,

9:07

Basically, by the Weak Law of Large Numbers with probability tending to one,

as n goes to infinity, the empirical distribution of the

random sequence X is close to true distribution p(x).

And so by definition X is strongly typical.

[BLANK_AUDIO]

The proof goes as follows. To prove Property

two, we write N x semicolon random sequence X

equals summation k equals 1 up to n, B_k(x),

where, B_k(x), is an indicator random variable,

which is equal to 1, if X_k is equal to x, and 0 if X_k is not equal to x.

So B_k(x) tells whether X_k is equal to x or not.

And the summation k equals 1 up to n, B_k(x)

simply counts number of occurrences of small x in the random sequence X.

[BLANK_AUDIO]

Then B_k(x), k equals 1 up to n are i.i.d random variables,

because X_1, X_2, up to X_n, are i.i.d random

variables. Specifically the probability that B_k(x)

equals 1, that is, X_k equals x,

is equal to p(x). And the probability

that B_k(x) equals 0 is equal to 1 minus p(x).

10:58

Note that, the expectation of B_k(x),

is equal to, 1 minus p(x) times 0, plus

p(x) times 1, which is equal to p(x).

In general, the expectation of an indicator random variable, of

an event, is equal to the probability of that event.

[BLANK_AUDIO]

Now, by the Weak Law of Large Numbers, for any

delta greater than 0 and for any x in the alphabet,

we have, the probability that the absolute

value of 1 over n summation k equals 1 up to n, B_k(x),

namely, the average number of occurrences of the value x in a random sequence x,

minus p(x), the probability of occurrence

of the value x, greater than delta divided by the size

of the alphabet, is less than delta divided by

the size of the alphabet, for n sufficiently large.

13:59

Because the number of terms in the summation

is exactly equal to the size of X,

this is equal to delta.

[BLANK_AUDIO]

Now the event summation x absolute

value, 1 over n, N x semicolon X minus p(x)

greater than delta, implies the event

absolute value, 1 over n, N x semicolon X minus p(x),

greater than delta divided by the size of the alphabet,

for some x.

15:48

And therefore we have obtained the upper bound, 1 minus delta.

This proves proposition two.

[BLANK_AUDIO]

Property 3 of Strong AEP, says that for n sufficiently

large, the size of the delta-typical set, is approximately

equal to 2 to the power n times the entropy of X.

The proof of this property follows from property

1 and property 2 of Strong AEP, in exactly

the same way as in theorem 5.3. So we leave this as an exercise.

[BLANK_AUDIO]

Theorem 6.3 is a stronger result, which says

that for sufficiently large n, there exists some function

phi of delta, which is strictly positive, such

that the probability of not getting a delta-typical sequence,

is less than 2 to the power minus n times phi of delta.