0:00

Let's move on to the second sense in which the generic local search algorithm

is under-determined. The main y loop reads, if there is a

superior neighboring solution Y, then reset the current solution to be y.

But there may of course be many legitimate choices for y.

For example, in the maximum cut problem, given a cut, there may be many vertices

whose Switch to the opposite group yields a cut with strictley more crossing edges.

So when you have multiple superior neighboring solutions, which one should

you choose. Well this is a tricky question and the

existing theory does not give us a clear cut answer.

Indeed it's seems like the right answer is highly domain dependent.

Answering the question is probably going to require some serious experimentation

on your part. With the data that you're interested in.

One general point I can make is that recall when you're using local search to

generate from scratch an approximate solution to an approximation problem you

want to inject randomness into the algorithm to coax it to explore the

solution space and return as many different locally optimal solutions as

possible over a bunch of independent runs.

You're going to remember the best of those locally optimal solutions.

Is going to return, remember, the best one at the end of the day.

So this is a second opportunity, in addition to the starting point, where you

can inject randomness into local search. If you have many possible improving

values of y, choose 1 At random. Alternatively you could try to be clever

about which y they should choose. For example, if there's multiple superior

neighboring y's, you could go to the best one.

For example, in the maximum cut problem, if there's lots of different vertices,

who've switched to either group, will yield a superior cut Pick the vertex

which increases the number of crossing edges by the largest amount.

In the traveling salesman problem, amongst all neighboring tours with

smaller costs, pick the one with the Minimum over all costs.

So this is a perfectly sensible rule about how to choose y.

It is myopic, and you could imagine doing much more sophisticated things to decided

which y to go to next. And indeed, if this is a problem you

really care about, you might want to work hard to figure out what are the right

heuristics for choosing y, that work well in your application.

A third question that you have to answer, in order to have a precisely defined

local search algorithm, is what are the neighborhoods.

For many problems, there is significant flexibility in how to define the

neighborhoods. And theory again does not give clear-cut

answers about how they should be designed.

Once again this seems like a highly domain dependent issue.

If you want to use local search on a problem of interest It's probably going

to be up to you to empirically explore which neighborhood choices seem to lead

the best performance of the local search algorithm.

One issue is likely that you'll have to grapple with is figuring out how big to

make your neighborhoods. To illustrate this point, let's return to

the maximum cut problem. So there we defined the neighbors of a

cut to be the other cuts you can reach by taking a single vertex and Moving into

the other group. This means each cut has a linear number,

o of n, neighboring cuts, but it's easy to make the neighborhood bigger if we

want. For eaxmple, we could define the

neighbors of a cut to be those cuts you can reach by taking you, 2 vertices or

fewer, and switching them to the opposite groups.

Now each cut is going to have a quadratic number of neighbors.

More generally of course, we could allow what single local move to do k swaps of

vertices between the 2 sides, and then the neighbor size would be o(n)^k.

So, what are the pros and cons of enlarging your neighborhood size.

We generally speaking, the bigger you make your neighborhoods, the more time

you're going to have to invest searching for in improving solution in each step.

For example, in the maximum cut problem if we implement things in a straight

forward way If we only allow 1 vertex to be switched in an iteration then we only

have to search through a linear number of options to figure out if there's an

improving solution or not. On the other hand, if we allow 2 vertices

to be switched in a given iteration we have to search through a quadratic number

of possibilities whether or not we're currently locally optimal.

So bigger the neighborhood generally speaking the longer its going to take to

check whether or not you're currently locally optimal or whether there's some

better solution that you're supposed to move on to.

The good news about enlarging your neighborhood size is that you're going to

have only fewer local optima. And in general, some of these local

optima that you're pruning are going to be bad solutions that you're happy to be

rid of. If you look back at the example I gave

you in the previous video that demonstrated that the simple local search

for maximum cut can be off by 50%. You'll see that if we just enlarge the

neighborhoods to allow 2 vertices to be swapped in the same iteration, then all

of a sudden on that 4 vertex example, local search would be guaranteed to

produced the globally maximum cut. The bad locally optimal cuts have been

pruned away. Now, even if you allow 2 vertices to be

swapped in a given iteration, there's going to be more elaborate examples

showing the local search might be off by 50%, but on many instances allowing this

larger neighborhood will give you better performance from local search.

Summarizing one high level design decision you should be clear on in your

head before you apply the local search heuristic design paradigm to a

computational problem is how much do you care about solution quality versus how

much do you care About the computational resources required.

If you care about solution quality a lot and you're willing to either wait or

you're willing to throw a lot of hardware at the problem, that suggests using

bigger neighborhoods. They're slower to search but you're

likely to get a better solution at the end of the day.

If you really want something quick and dirty, you want it to be fast, you don't

care that much about the solution quality, that suggests using simpler,

smaller neighborhoods. Neighborhoods that are fast to search,

knowing that some of the local optima you might get might not be very good.

Let me reiterate, these are just guidelines, these are not gospel.

In all computational problems, but especially with local search, the way you

proceed has to be guided by the particulars of your application.

So make sure you code up a bunch of different approaches, see what works.

Go with that. For our final two questions let's

supposed you've resolved the initial three and you have a fully specified

local search algorithm. So you've made a decision about exactly

what your neighborhoods are, you figured out the sweet spot for you between

efficient searchability and the solution quality you're likely to get at the end

of the day. You've made a decision about exactly how

you're going to generate the initial solution.

And you've made a decision about how, when you have multiple neighboring

superior solutions, which one you're going to go to next.

Now, lets talk about what kind of performance guarantees you can expect,

from a local search algorithm. So lets first just talk about running

time, and lets begin with the modest question, is it at least the case, that

this local search algorithm is guaranteed to converge Eventually.

In many of the scenarios you are likely to come across, the answer is yes.

Here's what I mean. Suppose you're dealing with a

computational problem where the set of possible solutions is finite.

And moreover, your local search is governed by an objective function.

And the way you define a superior neighboring solution, is that it's one

with better objective function value. This is exactly what we were doing in the

maximum cut problem. There the space was finite, it was just

the set of exponentially many graph Cuts and our objective function which is the

number of crossing cuts. Similarly for the traveling salesman

problem, the space is finite. It's just the roughly infactorial

possible tours, and again, how do you decide which tour to go to next? You look

one that decreases the objective function value of the tour.

Total cost of the tour. Whenever you have those 2 properties,

finiteness and strict improvement in the objective function, local search is

guaranteed to terminate. You can't cycle, because every time you

iterate you get something with a strictly better objective function value, and you

can't go on forever, eventually you'll run out of the finitely many possible

things to try. There is, of course, no reason to be

impressed by the finite conversions of local search.

After all, brute force search, equally well terminates in finite time.

So this class is all about having efficient algorithms that run quickly.

So the real question you want to ask is, local search guaranteed to converge, in

say, polynomial Here, the answer is generally negative.

When we studied the unweighted maximum cut problem, that was the exception that

proves the rule. That, there, we only needed a quadratic

number of iterations before finding a locally optimal solution, but, as we

mentioned in passing, even if you just pass to the weighted version of.

Of the maximum cut problem. There already, a local search might need,

in the worst case, an exponential of iterations, before halting with a locally

optimal solution. In practice, however, the situation is

rather different, with local search heuristics often, finding, locally

optimal solutions, quite quickly. We do not at present have a very

satisfactory theory that explains, or predicts, the running time of local

search algorithms. If you want to read more about it, you

might search for the keyword smooth analysis.

Another nice feature of local search algorithms, is that even if you're in the

unlucky case, where your algorithms are going to take a long.

In time, before it finds a locally optimal solution, you can always just

stop it early. So when you start the search, you can

just stay look, if after 24 hours you haven't found for me an local optimal

solution, just tell me the best solution you've found thus far.

So in addition to running time, we want to measure the performance of a local

search algorithm in terms of its solution quality.

So, is that going to be any good? So here the answer is definitely no.

And again, the maximum cut problem was the exception that proves the rule.

That's a very unusual problem that you can prove at least some kind of

performance guarantee about the local, locally optimal solutions.

For most of the optimization problems in which you might be tempted to apply the

local search design paradigm, there will be locally optimal solutions quite far

from globally optimal ones. Moreover this is not just a theoretical

pathology. Local search algorithms in practice will

sometimes generate extrememly lousy locally optimal solutions.

So this ties back into a point we made earlier, which is if you're using local

search not as a just post processing improving step, but actually to generate

from scratch a hopefully near optimal solution to an optimization problem, you

don't just want to run it once, because you don't know what you're going to get.

Rather you want to run it many times, making random decisions along the way,

either from a random starting point, or choosing random improving solutions to

move to, so that you explore as best you can the space of all solutions.

You want over many executions of local search to get your hands on as many

different locally optimal solutions as you can, they you can remember the best

one. Hopefully the best one at least will be

pretty close to a globally optimal solution.