0:07

Which goes back quite far with Gauss inventing fast Fourier transform,

even before Fourier defined it.

Then we will look at small examples,

just to get the feeling of what is going on with this matrix-vector

multiplications that can be sped up by the fast Fourier transform or FFT algorithm.

And we present the most famous case which is the Cooley-Tukey FFT.

And we do one example great detail which is the so-called

Decimation-in-Time FFT for lengths 2 to the N fast Fourier transforms.

And we conclude.

0:46

So Fourier, of course, had the Fourier transform.

If you are remembered of Fourier series, that's an invention by Joseph Fourier.

And clearly if you compute Fourier series, it helps to have a fast algorithm but

at that time it wasn't a discrete time problem.

It was a continuous time problem with Fourier series.

But it turned out another great mathematician, Gauss,

had a similar problem computing some trajectories of planets.

He had to compute trigonometric series and he was doing them by hand.

He figured out a quick way to compute trigonometric series when

the frequencies where actually harmonic.

And it turns out this was essentially the fast Fourier transform already,

even before Fourier defined Fourier series.

1:38

So that leads us back to 1805, when Gauss wrote down in his

personal notebook in Latin, a quick way to compute trigonometric series.

Fourier, as we know, invented Fourier series just a little bit later.

Then a lot of people started computing Fourier series,

spending a lot of time developing tricks.

And in 1958, an English statistician, Good,

came up with an algorithm which is still used but

which is not the most popular one.

The most popular one was invented in 1965 by Cooley and Tukey.

And it's a re-discovery of the fast Fourier transform algorithm for N,

when N is a power of a prime, typically is a power of 2 or a power of 3.

3:14

take out the divider by N, and this is equal to 1, okay?

So we can take the powers always modulo N, very important point.

This will lead to a lot of simplification.

So a DFT matrix of size N by N has entries.

Remember, the powers go from 0 to N-1,

from 0 to N-1, and here from 0,

1, 2, 3, N-1, etc.

3:56

Okay, so when n is equal to 2,

we have e to the -j2 pi over 2.

And this is equal to -1, okay?

So the first line of the W2 matrix is 1 and 1, the second line is 1, -1.

Okay, no sweat.

4:18

So the second case, W3, has powers,

here, 1 and 2, 2 and 4.

We take modulo 3, so you can see the 4 becomes a 1 here.

So this matrix is already a little bit simpler than that one and

here we wrote out the actual values in terms of complex numbers.

So this is simply e to the -j2 pi over 3.

4 pi over 3, 4 pi over 3, and here again, 2 pi over 3.

W4, okay, so what is W4?

It's e to the -j2 pi over 4

which is e to the -j pi over 2.

Which of course, is equal to -j, okay?

And this is a matrix W4, written out as given in the general formula.

Here we take modulo 4, so the W4 becomes a 1.

W6 becomes a 2, okay.

W6 becomes a 2, and W9, modulo 4 is just W, okay?

And now we know that W is -j so we can write this out.

And this matrix is very, very simple.

5:42

First line is all 1's, first column is all 1's.

Then -j, -1, j, 1, -1, 1, j,-1, -j.

5:52

Very simple matrix.

Now, when you see this matrix you really notice that

if I multiply W4 times a vector x of length 4,

I don't need many multiplications, right?

I mean, for example, here it's just a sum.

Here I have to exchange real and complex imaginary entries.

Here I have to change the sign.

Here I just have to take this sum but with change of signs, etc.

So W4 times x is not going to

require any multiplications, okay, only additions and subtractions.

6:32

And interchanges of real and imaginary parts.

So the reason we go through this small matrix is to see that there

is a lot of symmetry, a lot of structure that can be taken advantage of.

Okay, W5 is more complicated, okay?

It doesn't have these obvious simplifications that's only in modulo 5.

Simplifications of, for example, this character here, well,

W16 modulo 5, the closest modulo 5 is 15.

So that be W and so on.

Okay, now this character does not have an obvious simplification.

That requires quite a bit more thinking to see what structure is.

But so be it for now.

7:35

Okay, so what is the idea behind the fast algorithm for

the discrete fourier transform.

It is this latin phrase, divide et impera, or divide and conquer.

That usually is attributed to Julius Caesar.

It's the idea that you take a large problem.

You divide it into subproblems that you have a better handle on, okay?

So Take a problem, split it into two subproblems.

8:09

And so, simple problems are for example, w4, right.

If we could take large problem and

break it down into Four small problems like W4, right.

We know this will be very simple to do.

Now that we have the simple problems but you split the bigger problems into simple

ones, we need emergenal operation to get some intermediate solution.

You can do this here also.

And then you have to undo the split you did here.

And that will be the merger which gives you the solution, okay.

And so the key idea is how to do this split and merge.

8:45

And if you know how to do it once, that was one of the steps.

And you can do it a second time and then combine again and

then you have the result.

In this case we enter with vector x, small x.

That's a finite dimensional vector, let's say of size N.

And you end up with DFT coefficient, which is a finite dimensional

vector of size N which contains the result of the matrix vector multiplication.

9:27

compute W times X it's a matrix vector product.

The matrix has size N by N, x of size N, of course.

Then the complexity is order N squared.

Okay, and so to Fourier, you have to take N products between lines with a column.

So inner products that are N of them, each one takes order and computation.

That's an N squared, a quadratic complexity problem.

9:58

Okay, but when we study the problem when N is the power of 2 and

we cut the problem into two problems of size N/2 and

if the complexity is quadratic,

then the problems of size N/4 have complexity, sorry,

of the problems of size N/2 have complexity N squared/4.

10:24

We might have some complexity that is involved in recovering

of the full solution that might be a complexity order N.

So our solution now needs 2 times N/4 times 2.

That's N/2, that's N squared/2 + N, that's to do the recombination.

10:47

Okay, now if N is bigger than 4,

this is better than N squared, as you can immediately verify, okay?

It's not a big saving, okay?

It's a saving by essentially,

a factor of 2 here because the dominant term, of course, is the N squared.

Instead of having the original N squared complexity,

we have the N squared over 2 complexity.

11:24

Good, so let's do this.

So we start with x, it's a size N vector.

Okay, that entry, x[0], x[1], etc., up to x[N-1].

Split it into two subproblems.

Okay, the upper and the lower halves, and

compute DFTs of size N/2 here and there.

11:54

Okay, and we just saw the original problem, this problem,

if we don't do anything special about it, it has complexity N squared,

whereas these problems here have complexity N squared/4.

This guy also, N squared/4.

And so this overall thing now, after split and merge, has complexity N squared/2.

12:23

If we knew how to divide the problem once and

save some computation, well let's just do it again.

So divide and conquer is reapplied and

since N is the power of 2 we can cut the 2 problems of size

N/2 into 4 problems of size N/4 and we can keep going.

13:28

Okay, so each time we have this order N complexity to do the merging,

and we get this magic result that an algorithm that uses divide and

conquer at every step divide the problems into half problems over and over again.

We'll have a complexity N logging base 2 of N.

And of course if N is large, this is much,

much better than N because log 2 N is much, much smaller than N, okay.

14:07

So graphically, again, split the DFT into sizes,

which are of size N/2, N/4, N/8, for example.

In this case, you end up, so you'll have 2 problems of size N/2,

4 problems of size N/4, 8 problems of size N/8, so

this leads us to compute 8 DFTs of size N/8.

Merge the results into DFTs first of size N/4.

So that's the two N/8 becomes this.

And this becomes N/2.

And this becomes a result of size capital N.

14:50

Graphically, this is a very complex diagram but

you don't have to draw it yourself.

You can just admire it here on the slide, okay?

So you take the vector x of size N.

You split into 2 problems of size N/2.

You split into 4 problems of size N/4.

And finally 8 problems of size N/8.

Okay, so we have the original DFT of size N, which has been

divided into smaller DFTs of size N/8, okay, exactly 8 of them.

16:15

So, we start with the usual formula for the DFT.

So it is summation over capital N terms of the time

domain sequence times the root of unity raised to the nkth power.

We have seen this many times now.

To start the algorithm, a good guess is half of the answer.

Okay, so a good guess please.

Remember the good guess, it will help in the future.

So the good guess is that we want to divide the problem into two halves.

And we could do this easier by taking the first half of the sequence and

the second half.

Or because it's called decimation in time,

we are going to take the even index samples and the odd index terms.

That's why this is called decimation in time,

like we have seen in certain operations in single processing on sequences.

So we take the sequence and we derive two sequences, x[2n] and x[2n+1].

Now, the input was broken into even and odd index terms.

So on the output side the DFT results, or

that capital X[k] we break into the first half,

X[k], plus the second half, X[k + N/2].

Where k now runs only from 0 to capital N over 2-1.

So we did two things here, we took even and odd indexes.

So let's call this the even ones and the odd ones that was on the input.

And on the output, we took the first half and the second half, all right?

18:02

And we now look at the even and odd are input separately,

so this would be a bit of formula menagerie here.

So let's write X[k] as the first half of the sum,

so N goes from 0 to N over 2-1.

And the second half, which is the odd index terms,

okay, so the even guys, the odd guys.

And we do some rearrangement here.

18:39

It's very simple, we expand the exponent.

And the exponent, of course, is now W2nk+k.

Then we move this part here in front, okay?

Because it doesn't depend on the summation.

Because the summation is over n, so it only concerns this part.

So first summation has not changed.

It's the same from first line to second line to the last line.

We'll just see this in a second.

The only thing is that W2nk,

well, we can write this W2nk is e to

the -j2 pi over N times 2N, okay?

And this is equal to e to the- j2 pi over N over 2.

Times, sorry that was an nk here, times nk.

And this, of course, is equal W of root N over 2 times nk.

All right, so this is the simplification we have done here.

And this will be very important because this is simply a DFT of size N over 2.

Over the even index samples here with respect to

root of unity of order and over 2.

Okay, so here we identify a DFT which is of size N over 2.

20:11

Now, the second half has exactly the same effect.

Simply, we have this pre multiplication but it's outside.

But the summation inside is exactly, again, a DFT of size n over 2.

And this DFT is simply taken over the odd index samples, okay?

So, we write this more compactly as two transforms,

X'k, and Wk times X"K.

And these are the results of these DFTs of size N over 2 over the even guys.

21:07

In words, we can compute X, the DFT of x,

with two half size DFTs, which we have called X', X''.

Multiply the second DFT by Wk,

k going from 0 to N over 2-1, and

adding the results, okay?

So it would be x, first half here

will be X'+W to the k times X",

and this for indices over the first half.

21:49

The next step is, that we are going to consider separately the first half

of the output and the second half, okay?

So the first half,

we just keep the formula we have shown on the previous two slides, okay.

So X[k] is the sum of X' and X''.

There is a weight here Wk, k goes from 0 to N over 2-1.

For the second half, k still goes from 0 to N over 2- 1.

We simply add the shift here by N over 2.

So we just write out the formula as we had, and

it simply creates this little elements here in the exponent.

And we have to worry what this will do, okay.

Now, we remember that W N

over 2 is e to the -j2 pi

over N times N over 2.

This is equal to e to the –j pi,

which is equal to –1, okay?

And because of this, we can go to the following simplifications.

23:10

So in front, we have an N over 2 here,

that is at the power of W N over 2.

Okay, so there we remember that this is equal to 1.

We saw this at the very beginning.

So this thing disappears and the first half here is very simple, okay?

23:33

What do we recognize here?

It's a DFT of size N over 2, nothing too surprising here.

And this character here we just saw before, just gives us -1 in front.

And so instead of the plus that we have above here we have the minus sign, so

W to the k, And last but not the least we do the same simplification here.

So this falls out, it's an integral power N/2 of, root of order N/2.

So this is the second DFT, okay?

So what we end up having is something very simple is that the second half

looks very much like the first half except that we have a change of sign here, okay.

24:19

We are going to put these in words.

We can compute X[k] and X[k+N/2]

where k goes from 0 to N/2- 1 by dividing

the input into even and odd indexed samples.

Compute two DFTs of size N/2, that's the division.

Multiplication of the output of the second DFT by WK,

that would cost us N/2 multiplications.

Combine the output with sum and difference,

that's what we have just seen on the last slide.

25:12

Then, the second output, we multiply by W N to the k.

So that's W N 0, 1, 2, 3.

Then we take sum here and difference.

That gives us X[0] plus X[0] plus N/2, that's X[4].

That's one combination.

Then plus minus gives us X[1] and X[5] etc until we get x(7), all right?

And so we have very clearly mapped out this division, so it takes the input.

So we start with x(0), x(1) to x(7) and

we divide x(2) will come here, etc, x(3) will go there,

x(4) goes there, x(5) goes there,

x(6) goes there, x(7) goes there.

Okay, so we have done the division of the input into two halves, the even and

the odd.

We have taken two DFTs of half size.

We have weighted the output of the second DFT by these complex numbers.

We have done sum and differences and

we get our final result here, capital X, okay?

What was the cost?

Well, we have 2 DFTs to compute of size 4 in this case.

Then we have N/2 multiplications.

And we have N, additions, okay.

So what is the complexity now?

27:16

So a total of N squared + 2 + N/2,

which is indeed smaller than N squared for N large enough.

So in general, it's how the complexity of the initial problem, okay?

Why, because we have moved from N squared into 2 times

N/2 squared which is N squared/2, all right?

That's if we do only one step of division and merging.

27:45

Of course, as announced, we can repeat this, okay?

And we'll repeat this until we get to trivial, small DFTs of size 2,

so sum and difference.

This requires log in base 2/N- 1 steps.

Each step requires a merger, so

a merger as we have seen requires N/2 multiplications and additions.

So the total now is N/2, log in base 2 to N-

1 multiplications and N log 2 times of N times N additions.

28:32

a DFT of size N requires order N log2 N operations.

Why do I say it's such a key result?

Because essentially signal processing has lived and died by fast algorithms.

So the invention of the fast Fourier transform,

this algorithms that I showed you by Cooley and

Tukey in 1965 created an explosion of interest in digital signal processing.

Because all of a sudden, computations which seemed very hard before were

actually feasible even on very simple computers at some time, okay?

So the birth of digital signal processing is really linked to the invention or

the rediscovery of the fast Fourier transform algorithm.

29:26

What do we do?

We separate in even and odd samples then we compute two DFT's,

we compute the sum and difference.

Okay we've wrote it out here I'm not expecting you that you understand this

immediately by looking at 4 matrices of size 4 by 4, so let me try to explain.

This matrix here does the division into even and odd.

Okay, how do we see this?

The first entry takes X1, sorry X0, the second entry here will take out X2.

The third entry takes out X1, and the last one X3.

Okay, so now we have separated odd and even.

30:25

the roots of unity meaning the outputs of this character here.

So that's these two guys.

And we have to recombine them by sum and difference and

this was combined into this matrix here which does

both the multiplication here by the roots of unity and the sum and difference.

And this is exactly W4, this matrix we had seen before.

This uses 8 additions, no multiplications.

Right over the additions here, 1, 2, 3, 4, 5, 6, 7, 8.

Okay, 8 additions no multiplications.

So a DFT of size 4 is very cheap, please remember.

31:05

Okay, if we could do it for N = 4, it's tempting to do it for N = 8.

Now, this is going to be big, and it's going to be too big for a single slide.

So we first write down what W8 is.

It's a usual matrix, first line and first column of 1s.

And then successive rules of unity to power 1, 2, 3, 4 up to 7.

Then even powers etc up to W49.

So step 1 is to separate even and odd index samples.

This is exactly what we have seen before.

This picks out the even sample.

This picks out the odd samples.

This we call D8, or a decimation of size 8 matrix.

It requires only index changes, no arithmetic operations.

32:02

Okay, then we want two DFTs of size 4.

Okay, now we're not going to factorize these guys, it's already long hand,

just a couple of slides ago but you recognize a DFT 4 here.

So this is a 0 matrix and we write this compactly as a matrix which

is a block matrix with two W4 matrices and two zero matrices.

Okay, now, this requires two DFTs of size four.

So that's 16 additions, right.

32:39

Now we have to multiply the output of the second DFT of size four by Wk,

with k going from zero to three, okay?

So the first half is untouched.

The second half is weighted by 1,

W, W squared, W cubed, okay?

And in compact notation, we can again use block matrices so

this is an identity matrix of size 4.

This is a diagonal matrix with the weights given by 1, W, W squared, W3.

Okay, this is this character here.

And we have again zero matrices of size 4, okay?

33:21

This requires two multiplications, two complex multiplications, why?

It's this multiplication, here's this one.

W square is actually equal to minus j.

And this doesn't require complex operations.

It just requires interchanging real and imaginary [INAUDIBLE].

Okay, so so far we have 16 additions, two multiplications.

Okay, now we have to recombine the final output.

If you remember, it's sum and differences of X[k] and X[k+N/2].

So we do this for the first output, it's the sum here.

Second output is a sum.

Third output is a sum.

Fourth is a sum.

The fifth output is a difference.

Again, a difference for the sixth, seventh, and eighth output.

34:17

And this we can drive compactly by recognizing that these guys are identity

matrices.

And here there's simply a sign change.

Okay, so in block matrix from we have this very simple formula.

The identity of size four, identity of size four,

identity of size four minus identity of size four.

This requires eight additions.

So if my calculations were correct we had

16 additions with our two DFTs of size 4.

Then we have two complex multiplications, and now we have, again,

eight additions.

35:43

So we can write this out in a flowgraph view.

I'm not going to go through this detail, but you can sort of see what is happening.

We separate the inputs now, not just evens and odds because we do this twice.

So we end up having x0 and x4 plus neighbors, x2, x6, x1, x5, x3, x7.

Then we do the weight gains, the sum and

differences and we keep going and here is a last sum and difference.

Which indeed computes the result here.

Okay so this flow diagram is everything we have seen before put

into a flow diagram where everything moves from left to right.

It gets weighted when we have a complex number next to it.

And it gets summed or differenced when we have a merger of two arrows, okay?

And you can count here the number of multiplications,

you will get the same number of 24 adds and two multiplications.

These operations are always on complex numbers.

36:49

Okay, so you might say this was a lot of effort on a very small sized DFT.

So let's see where this has a real impact in practice.

So in image processing, if you have a digital camera,

you take large images, 1,000 by 1,000 pixels or something like this.

And out of these large images, small blocks are derived.

Okay, so we take a large image.

Okay, let's say 1024 by 1024 pixels to make it simple.

And then you divide it into small blocks here.

And the small blocks are of size 8 by 8, okay?

On that 8 by 8 block we take a transform called the discrete cosine transform.

[INAUDIBLE] transform, discrete cosine transform.

And a DCT is essentially a real version of a DFT.

I don't want to get into the trigonometric mess of defining the DCT in detail.

We could do this in a supplementary lecture.

For now let's just assume it's a real version of the DFT.

And it has a fast algorithm that is very close to a DFT algorithm.

So it's a little bit more complicated, little details, phase shifts and so on.

But to a first view, it is like a real version of

a discrete Fourier transform, okay?

And if it takes a direct transform, well, 8 by 8 is 64.

So there is a matrix of size 64 by 64 that multiplies

a vector where we have the pixels from the 8 by 8 block.

So 64 squared is 4,000 and something, 4,096.

And for every block you have to take this transform.

Okay, and each multiplication has to be computed with fixed point multiplication.

It's expensive, takes time or hardware.

Okay, now there is a first trick,

that this transform is a so-called separable transform.

So you can take your eight by eight block,

okay let me try to make an approximation of an eight by eight block.

39:03

Okay, sorry it takes me longer than it should [LAUGH] but

that's an eight by eight block.

It turns out this [INAUDIBLE] DCT transform you can take,

transform over each line.

Okay, that's eight transforms and transforms over eight columns.

That's another eight transformed.

Okay, so that's 16 DFTs let's say.

Okay, now the algorithm we saw has two multiplications and

so we have a total of 32 multiplications.

32 rather than 4,000, that's two orders of more than two orders of magnitude, right.

Okay, so if you have a fixed point processor in your camera instead

of working for a hundred seconds, it will work for half a second or so.

So that's a big deal.

If you have taken pictures with a digital camera you know that you like to take

the next picture.

And you don't want to wait for a hundred seconds.

So this is a real big deal.

And this is at the heart of an image compression algorithm called JPEG.

And that we will discuss later in this digital signal processing class.

Okay, it's time to conclude.

The short answer to this module is don't worry, be happy.

There is a Cooley-Tukey algorithm, very popular.

That for N which is a power of 2 gives a complexity

that is of order N lg and base 2 N, really big deal, okay?

Maybe one of the most important algorithms ever invented.

40:37

Now this is the one we explained here.

I want to be clear that for every possible length, there is an FFT algorithm.

It's not going to be a Cooley-Turkey.

It's not going to be the simple divide and

conquer in pieces which are house pieces of the initial problem, etc.

But there are algorithms for every possible length.

Even lengths which are prime,

even lengths which are the product of different primes, etc.

Okay, so when you compute an FFT you should never zero pad the vector so

as to get the length that is a power of two.

This is something that people do.

And it's a mistake because you add zeros when maybe there shouldn't be any zeros.

And you should just take the initial lengths and

look at the package that is out there.

Like for example the Fastest Fourier Transform in the West, FFTW.

Which has a whole series of routines and works very well.

There are also other the packages, one is called SPIRAL.

That also computes very efficiently all sorts of different

fast Fourier transforms.

And if you use these packages, it will make a big difference.

It will make differences by orders of magnitude, okay?

And with this,

I think we have finished this entire module on the Fourier transform.

The Fourier series, and the DFT, so discrete Fourier transform.

That we concluded with this topic on computational complexity and

fast algorithms.

That has made a big difference in the implementation of signal processing

algorithms.