Practical methods for solving linear programs.

Have you ever heard that story about the student who walks in late to a class,

sees what he thinks is homework written up on the chalk board and solves them,

not knowing that they are in fact major open research questions?

Well, maybe you thought that story was apocryphal, assuming I did for many

years, But it's not. Turns out it's totally about George

Dantzig. In 1939, he walked into his graduate

statistics class late. The 2 major open questions in the field

were written on the chalkboard, and he solved them within a few days.

That wound up being his PhD dissertation before he got interested in linear

programming. Now, the algorithms used to efficiently

solve one of your programs, they're pretty complicated, both versions of the

simplex method and so called Interior Point algorithms.

So, it may or may not be a good use of your time learning more about how the

guts of these algorithms work. But what I'm most surely is a good use of

your time is how to be an educated clients of these algorithms that is to

get some practice formulating problems as linear programs and solving linear

programs using either op, open source solvers.

Or commercial linear programming solvers. Those solvers are a very powerful black

box tool to have in your algorithm assists toolbox.

In fact there are black box subroutines available to you that are strictly more

powerful than linear programming. One generalization is convex programming,

so linear functions are special case of convex functions and if you want to

minimize the convex function or equivalently maximize a concave function

subject to convex constraints that again is a problem you can solve efficiently

both in theory in terms of polynomial time and in practice.

May be the problem size is you can reach on.

Quite as big with linear programming, but there pretty close.

The different generalization is energer linear programming, so these are like

linear program but where you add the extra constraint that certain decision

variables have to take on and interger variable.

So something like 1/2 as a value for a decsion variable would be disallowed in

an interger program. Now these in general are no longer

solvable efficiently in theory, it's easy to encode empty complete problems, like

lets say vertex cover as a special case of engineer programming.

But there is quite a bit of advance technology out there.

For solving at least in special domains, integer programs.

So again if you have an empty complete problem and you really need to solve it,

integer programming is a technique your going to want to learn more about.

What are some other topics we haven't had time to discuss? Well, first of all, for

many of topics we have discussed, we've only scratched the surface.

There's beautiful and useful stuff beyond what we've discussed in these discussed

in these classes. On topics ranging from data structures,

for example, with hashing in research trees, to graph algorithms. We already

mentioned matchings and flows. To approximation alogrithm like better

heuristics for the travelling salesman problem.

A topic we've barely discussed at all is geometric algorithms, one exception being

a divide and conquer algorithm for the closest pair problem that we discussed in

part one. Roughly speaking, the study of geometric

algorithms has two flavors, low dimensional problems and high dimensional

Dimensional problems. So when I say low dimensional, I mean

probably 2 or 3 dimension. So problems in the plane or problems in free space.

A conical computational problem here would be the convex hull problem.

Which means I give you a bunch of points and then you want to know which points

are on the convex hull of the point set. So, here's a way to think about the

convex hull problem in the plane. So, take a wooden board, a flat wooden

board, that represents the plane. Now, pound a bunch of nails into the

wooden board. Those are representing the points in the

plane. Now, to compute the convex hull, what you

can do is you can take a rubber band, stretch it out really far so that it

bounds. All of the nails you pounded into the

wooden board and now let the wo, rubber band go, and it's going to snap around

the boundry nails. That's a way to computer the convex hull

in 2 dimensions. And so the question is, how do you do that efficiently given just

a bunch of coordinates of points? The state-of-the-art here are divide and

conquer algorythm, very much in the spirit of the closest pair problem back

in Part 1. So why would you care about computing

convex hulls? Well, there's lots of reasons.

But, you know, as one example, suppose you were doing some 3D graphics, and you

had moving objects in your scene, and you wanted to know whether, when two objects

collide. Because then you have to respond

appropriately in rendering the scene. Well, to know where the two objects

collide, you don't have to remember all of the details of the objects.

You just have to keep track of their convex hulls.

They collide exactly when their convex hulls intersect, and so that's one reason

you might want to do that computation. A very different flavor of geometric

algorithms is the high-dimensional case. And here, I'm thinking of the number of

dimensions as being a thou, in the thousands say, or perhaps even quite a

bit more than that. Now, why would you ever have points in

the thousands of dimensions? Well, imagine you were doing say, information

retrieval, imagine you had a bunch of documents.

Now, documents don't really have geometry intrinsically, but it can be useful to

imbue them with geometry. How do you do that? Well, imagine you

make a list of all of the words in the world.

That you care about. So maybe say, 10,000 words that you think

are interesting. And for a given document, you just count

the number of occurrences of each of these 10,000 interesting words in the

document. So maybe lots of them are zero, the words

don't occur at all. You know, but maybe there's some word

that occurs seven times in the document. The document.

So then you'd give it a seven in that coordinate and boom, you've now

represented each document as a point in 10,000 dimensional space.

One coordinate for each of the interesting words, the value of the

coordinate being the frequency of that word in that document.

Now you can start asking questions like, given a new document is it similar to any

of the documents you've already been storing? And geometrically, that just

corresponds to asking something called a nearest neighbor query.

Given a bunch of points and possibly high dimensional space, and given a new point,

which of the previous points is closest, say a new distance, to the new point you

where just given? That would be a canonical question you would ask, in high

dimensional computational geometry. In these Design and Analysis of

Algorithms courses, we've been focusing on the most classical style of

computation, where you're given upfront an input.

You do a computation and you produce some output.

And you take a bow, and you leave the stage.

But if you think about computation in the real world, there's plenty of algorithms

whose work Is never finished, algorithms that in effect run forever.

Two applications that we've mentioned in passing in this class are the caching

problems, so for example if you're writing an algorithm to manage a cache or

an algorithm system, that algorithms work is never really done.

It's going to have to manage the cache Indefinitely.

Similarly, you can think about routing in a network, say in Internet routing.

Again, that algorithm's work is never done.

There's always going to be link failures. There's always going to be new nodes.

There's always going to be new data to route.

So, it has to keep making routing decisions indefinitely, as long as the

Internet exists. As you would hope, both the theory and

the practice of understanding algorithms that run indefinitely, making decisions

in real time, is based quite squarely on the lessons that we've learned in the

classical computational paradigm here. But it is an interesting.

Topic worthy of study in its own right. A major concern of algorithms research in

the 21st century is massive data sets, meaning data sets that are way too big to

fit in the main memory of a single machine.

One hot topic has been so-called streaming algorithms, that is algorithms

which have to in effect take a firehose of data being generated at a torrential

pace and boil it down using small space into just a few accurate statistics.

So you might think, for example, about an algorithm running locally at a network

router with data packets flying through the router at a ridiculous pace or an

algorithm which is responsible for summarizing the data generated by a bunch

of telescopic observations. A final, important topic by which the

current theory is actually quite immature is understanding how to exploit

parallelism in tackling massive data sets.

For example, using the distributive system's approach exported by the map

produced Hadoop Systems.