Well the fractions work out this way.
This kind of thing says that what we're going to do is take two hospitals
with probabilities proportionate to size, and then take the same number of employees
in each, that's what all of these equations refer to.
Across two stages we're going to get balance in the overall sampling fraction
for all the employees,
they all have the same chance of being selected by doing it in two steps,
where the first step, we over sample big hospitals and under sample small.
But, in the second step, we over sampled employees in small hospitals and
under sample employees in large hospitals, and
the produce of the two always comes out to be our one in 60.
Boy, I need an easy way to do this because that sounds complicated to me.
It's hard to keep track of, this is an bookkeeping problem.
We know have to keep track of probabilities of selection for
the hospitals and for the employees within hospitals.
But it can be managed as long as we're careful and
go through this slowly with our available information.
So let's go back to our list of hospitals, here they are now,
here's all twelve of them, and do a PPS sample.
And I've added a new column here, the cumulative size of the hospital.
So, notice for hospital one, the actual size is 420, the cumulative size is 420.
For hospital two, the actual size is 180,
the cumulative size is 600, which is 420 plus 180.
For the third hospital, it's size, it's actual size is 120.
It's cumulative is 720.
600 plus 120. We keep adding in the sizes successively,
till we get down to the 12th hospital where its actual is 240, and
its cumulative size is 6,000, which is the total number in our population.
So here's the process, what we're going to do is generate a random number now,
not from one to 12 but from one to 6,000.
Let's suppose that we select two random numbers from one to 6,000.
The first random number is 702.
Four-digit random number from 1 to 6,000, 07, 02, 1744 is the second one.
We're going to go through and find the first hospital on the cumulative
sum whose cumulative sum is greater than or equal to the first random number, and
then we're going to do the same thing for the second number.
Find the hospital whose cumulative sum first exceeds,
as we're working our way down the list, to that number.
That chooses hospital three and seven.
So you can see now I've lined up 702 with hospital three.
Because 702 falls in the range,
if you will, from 601 one more than hospital two's, but
it's included in hospital three's up to 720, and the 1744 falls in the range,
let's see, from hospital five to six, from 1561 to 1920.
In the first case, there are 120 numbers from 600 to 720.
601, 602, 603 and so on through 720.
120 numbers we chose one of them.
But that means that I could have chose any one of the numbers
from 601 to 720 in that hospital.
The chance of that hospital with one random selection being selected
is 120 from the 6,000.
Of, that's what we wanted,
is proportionate to it's size using the cumulative numbers.
Similarly for hospital six, there are 360 numbers that go from 1561,
the one that's above hospital five, from 1561 to 1720.
360 from 6000, that's its chances of being selected, and
we happen to get one of them, 1744.
So now, we've made two draws, each with probabilities proportioned to size,
which is what we wanted in the first stage.
Now we can go to the second stage and take 50 at random from 120, and
50 at random from the 360 in hospitals three and six, respectively.