0:00

In this video, we're going to, look at a proof that the perceptron learning

procedure will eventually get the weights into the cone of feasible solutions.

I don't want you to get the wrong idea about the course from this video.

In general, it's gonna be about engineering, not about proofs of things.

There'll be very few proofs in the course. But we get to understand quite a lot more

about perceptrons when we try and prove that they will eventually get the right

answer. So we going to use our geometric

understanding of what's happening in weight space as subset from learns, to get

a proof that the perceptron will eventually find a weight vector if it gets

the right answer for all of the training cases, if any such vector exists.

And our proof is gonna assume that there is a vector that gets the right answer for

all training cases. We'll call that a feasible vector.

An example of a feasible vector is shown by the green dot in the diagram.

1:13

And what we want to show. This is the idea for the proof.

Is that, every time he gets a training case wrong, it will update the current

weight vector. In a way that makes it closer to every

feasible weight factor. So we can represent the squared distance

of the current weight factor from a feasible weight factor, as the sum of a

squared distance along the line of the input factor that defines the training

case, and another squared difference orthogonal to that line.

The orthogonal squared distance won't change, and the squared distance along the

line of the input factor will get smaller. So our hopeful claim is that, every time

the perceptor makes a mistake. Our current weight factor is going to get

closer to all feasible weight factors. Now this is almost right, but there's an

unfortunate problem. If you look at the feasible weight vector

in gold, it's just on the right side of the plane that defines one of the training

cases. And the current weight factor is just on

the wrong side, and the input vector is quite big.

So when we add the input factor to the count weight factor, we actually get

further away from that gold feasible weight factor.

So our hopeful claim doesn't work, but we can fix it up so that it does.

2:53

So what we're gonna do is we're gonna define a generously feasible weight

factor. That's a weight factor that not only gets

every training case right, but it gets it right by at least a certain margin.

Where the margin is as big as the input factor for that training case.

3:13

So we take the cone of feasible solutions, and inside that we have another cone of

generously feasible solutions. Which get everything right by at least the

size of the input vector. And now, our proof will work.

Now we can make the claim, that every time the perceptron makes a mistake.

The squared distance to all of the generously feasible weight vectors would

be decreased by at least the squared length of the input vector, which is the

update we make. So given that, we can get an informal

sketch of a proof of convergence. I'm not gonna try and make this formal.

I'm more interested in the engineering than the mathematics.

If your mathematician i'm sure you can make it formal yourself.

So, every time the preceptor makes a mistake, the current weight vector moves

and it decreases its squared distance from every feas, generously feasible weight

vector, by at least the squared length of the current input vector.

And so the squared distance to all the generously feasible weight vectors

decreases by at least that squared length and assuming that none of the input

vectors are infinitesimally small. That means that after a finite number of

mistakes the weight vector must lie in the feasible region if this region exists.

Notice it doesn't have to lie in the generously feasible region, but it has to

get into the feasible region to make, to stop it making mistakes.

And that's it. That's our informal sketch of a proof that

the perceptron convergence procedure works.

But notice, it all depends on the assumption that there is a generously

feasible weight vector. And if there is no such vector, the whole

proof falls apart.