For our next case study of how to use greedy algorithms, we're going to turn to
the application domain of scheduling. That is how do you schedule jobs on
shared resources in order to accomplish some objective.
So, the domain of scheduling there's lots of different applications of greedy
algorithms. We'll see two in this course.
we'll start for today just with the following simple scenario.
So, we'll assume, for today, that there's just one shared resource.
This resource could represent any number of things.
For concreteness, you can think of it as a computer processor.
And then, there's a lot of different things that got to get done.
So, for example, there's a lot of processes that have to be handled by this
processor. In the algprithmic question, we are going
to study, is, in what order should we sequence these jobs?
Which one should we do first, which one should we do second, and so on, all the
way up to which one should we do last. So, obviously to answer this question, we
need to pin down the mathematical model a little bit more precisely.
And lets start with just, you know, what is the characteristics of jobs, what
information do we have that might lead us to prefer one job over another.
But for this problem, we're going to assume that each job comes with two known
parameters. So, first of all, job j has what we're
going to call a weight w sub j. That's a non-negative real number.
And you should think of the weight of a job as quantifying its importance.
That is, jobs with a higher weight, in some sense, deserve to be processed
earlier than those with a lower weight. And secondly, each job j is going to come
with a non negative length l sub j. Depending on the application, you may or
may not have a good estimate of how long jobs are going to take, but for today to
keep things simple, let's assume we know what the length of every job is, and
that's l sub j, it's part of the input to are problem.
So. we have now defined the input to this
computational problem. We get n jobs each specified by a weight
and a length. And we know that the output is going to
be a sequence of these n jobs in some order.
So, what we have to understand now is what criterion do we want to optimize?
What are we trying to accomplish with this sequence?
To explain that, I need to tell you about completion times of jobs.
So, the completion time of a job is defined hopefully in exactly the way
you'd think. So, for the job which is scheduled first,
it's just the length of the job because that's how long it takes to process that
job. For whatever job gets scheduled second,
its completion time is the length of the first job and then, the length of that
job itself. So, in other words, it's just the total
time which elapses before that job gets completed,
okay? So, in general, the completion time of a
job is just the sum of the lengths of the jobs scheduled to before that job plus
the length of that job itself. To make sure this is clear, let's go
through a quick example. So, suppose there are three jobs with
lengths one, two, and three. I'm not going to tell you the job weights
because they're irrelevant for the purposes of computing the completion
time. And let's suppose we do the schedule
where we just schedule job one first, the job two, then job three.
So, pictorially, I'm going to represent that schedule just by stacking the jobs
on top of each other with the interpretation that time starts at the
bottom. So, time zero is where we schedule job
one. And then, time increases as we go from
the bottom to the top of the diagram. And the question then is, what are the
completion times of these three jobs? Okay.
So, the correct answer is answer C. So, for the first job, it gets scheduled
first so it's very happy and it just takes one unit of time to complete, so
its completion time is one. The second job, well, it has to wait for
the first job to complete so one unit of time elapses and then, it itself has to
complete so that's two more units so it gets to the completion time of three.
For the third job, it has to wait for the first two to complete, so that adds three
to the clock, and then plus it takes three units of time for a total of six.
So, that's the definition of job completion times. In some sense, we
obviously want completion times to be as small as possible.
But it's not so simple. In any given schedule, the jobs that are
give early on are going to have small completion times and the jobs towards the
end are going to have big completion times. So inevitably, we're going be have
to trading off the completion times between different jobs.
So, what is the optimal way to so that? Well, that depends on our objective
function, and in scheduling, there's many different objective functions you might
want to use. today, I'm just going to tell you about
one. It's not the only natural objective
function, but it's one of several most natural objective functions.
It's called minimizing the weighted sum of completion times.
You translate this English phrase into mathematics in the obvious way.
What you want to do is you want to minimize the sum over all n jobs of their
completion time, but then multiplied by their weight of [UNKNOWN] j.
Okay. So, the sum over j of w j times c j.
The w j is the weight and c j is the completion time as defined on the
previous slot. If you think about it for a second,
you'll realize this is equivalent to minimizing the weighted average of the
completion times with the weights given as in the input.
So, just to make sure this makes sense, let's go back to the example that we saw.
In that example, we had jobs with lengths one, two, and three, and we thought about
to schedule or we scheduled them in that order.
To evaluate the subjective function, I'd have to tell you their weights, so let's
suppose their weights are three, two, and one, respectively.
In this case, the weighted sum of completion times in the schedule,
in the previous slide, well first, we begin with a, the first
job, which has weight three. Its completion time, remember, was one.
Then, we have the second job with weight two, its completion time is three.
Then, we have the third job with weight one, its completion time was six.
So, we sum up the weighted completion times and we get a total of fifteen.
And I'll let you verify that, in fact, all of the three factorial or six
schedules in that example, this is, in fact, the schedule that minimizes the
weighted sum of completion times. And the algorithmic question we're going
to study next, is how do we do this in general?
Given arbitrary input in jobs, weights, and lengths, what is the sequence that
minimizes this sum over all n factorial sequences you might consider?