0:02

If our symbol table is really going to be dynamic,

we need to be able to delete key value pairs from the table.

As we'll see, all symbol table implementations

lead to complications when we try to do this operation.

Binary search trees is our first example.

So we need to fill in this one table,

what's the cost of deletion in a binary search tree?

How we're going to really do that?

Well, let's take a look at a very lazy approach

which we setup for in our basic conventions for symbol tables.

What we can do to remove a node with a given key,

is just mark it with a tombstone.

Say well, we'll leave the key in the tree to guide searches,

but we won't count it as being in the symbol table.

And actually, you can make some progress with this kind of method,

leaving tombstones through out the tree.

And as long as there aren't too many deletions,

you can keep the search cost and deletion and insert cost to be logarithmic.

But it definitely becomes inconvenient to manage large numbers of

tombstones in highly dynamic situations with large numbers of keys and values.

Eventually, you're going to get an overload

of memory and you're going to have to rebuild the thing,

or clean out the tombstones in some way.

So we need to look for a better way.

This is a general method that people often use in all different types of implementations,

but in modern systems it's rather unsatisfactory.

Well, let's look at a simpler problem.

What about deleting the minimum?

Well actually, that's maybe not too difficult.

To delete the minimum in a binary search tree again,

we just go to the left until we get a null left link.

And then what we can do, is just return that node's right link,

then that old node,

nobody's pointing to it,

so it's available for garbage collection.

And then we use our usual trick of returning the link

that we went down to update the other links after the recursive calls.

And also, we have to update the count,

something happened down below,

and we use that code to update the counts in a consistent way.

So this code implements deleteMin, not too bad at all.

If x.left equals null return x.right.

Otherwise, x.left equals deleteMin x.left,

and then when you're done with that,

you fix the count.

So maybe a node got deleted down there,

but always the invariant is,

that the count of the node is one plus size the left and right,

and then return x and fix the links from the counts on the way up.

That's a fine implementation for deleteMin and it also works for deleteMax.

And that's the basis for a general method for deleting

nodes from BSTs known as Hibbard deletion.

So that's the second case.

The first case for Hibbard deletion is what we want to do to delete a node with key case.

We search for the node that contains the key.

In the easiest case is,

that node has no children.

So to delete the node that has no children,

just return null and then go back up to update the counts as usual,

that's the easy case.

The next most difficult case is like the deleteMin case.

We find a node T that contains our key.

So like deleting R in this tree,

it only has one child.

Just go ahead and return the link to that child,

and that updates the link and everything works fine.

And the node that was deleted is available for garbage collections,

nobody's pointing to it.

And again, update all the counts after the recursive calls.So zero children no problem,

one child no problem.

The problem is, what happens when there's two children?

So say, we want to delete node E in this tree.

We have only one link and we can get rid of the node,

but we have only one link pointing to it,

but we have two links pointing down from it.

So what are we going to do with those two links?

Well, the Hibbard deletion mechanism which is pretty old,

50 years ago it was proposed,

says go ahead and find the next smallest node in the right sub-tree of that tree.

In this case, that's H. What's that node?

Well, it's the minimum in T's right sub-tree.

And we know how to delete the minimum.

So we just find that minimum node.

And in this case it's H,

and we put that node in T spot and then delete the minimum.

So find the H, that's the minimum,

hang on to it, and then delete the minimum in T's sub-tree.

So we take the E, replace it with the H,

and delete the H, and then everything's fine.

That's still a BST.

Essentially, we're finding a node that has only one link leading that node,

and then replacing the node that we need to delete with that one.

That's Hibbard deletion.

It's a little bit asymmetric.

Why are we using the successor,

and not the predecessor?

No real reason.

It's not really satisfactory because of that,

and we'll come back to this, but it works.

This is the code for Hibbard deletion.

So we search for the key,

if it's got no right child we're fine,

we just return x.left and that the handles both cases zero and one.

If it does have a right child and we do this,

find the minimum on the right,

deleteMin on the right and then fix the links,

and then update our count that covers all cases.

Actually not that much code is complicated,

but not particularly more complicated than other code we've seen like rank,

and floor, and ceiling,

and that implements Hibbard deletion.

So now, we have a fully dynamic symbol table where we can insert and delete the number of

nodes that we have in the tree is always

proportional to the number of key value pairs in the symbol table.

And the problem is and this was quite a surprise when it was first discovered,

actually many years after Hibbard proposed the algorithm is this lack of

symmetry tends to lead to

difficulties and here we're just inserting the leading alternating,

in certain delete a random key,

so that maybe well models a situation or practical situation.

And as you watch it go for a while,

you can see that this thing

about going to the right and taking the successor all the time,

the trees becoming much less balanced that it was.

And this seems to be a problem,

we can't be supposedly having a dynamic situation that is going

to allow support of lots of different inserts and leads and in the end,

wind up with a less balanced treat.

What's worse.

so how are we going to fix it?

So in the end researchers showed that

after a sufficiently long sequence of random inserts and the deletes,

the height of the tree becomes square root of n, not log n's,

spurred event is hugely bigger than a log n,

it might make the difference between

acceptable and unacceptable performance in real applications.

And what's worse, if you try to fix it by say,

randomly choosing between the left and the right.

That doesn't work i, t still becomes square root of n. And that's

a very longstanding open problem to find

a natural simple efficient delete for binary search trees.

That's another one like merging in place,

that you'd think there ought to be an easy way to do it,

but in 50 years,

no one's really discovered one.

We're going to look at something pretty close in the next lecture.

But here's the situation that we're left with.

We have a binary search tree algorithm which is fine,

in that it gives us log n performance for search and

insert in a situation where we can think that these things are happening randomly.

But we're kind of stuck.

If we allow delete,

in fact everything degenerates to square root of n.

And we also have a problem with the worst case.

If the keys happened to have some order in them,

our trees are not going to be balanced at all,

and that's going to make the difference between

acceptable and not acceptable performance.

What we're going to look at next time called the

Red-Black Binary Search Tree will guarantee logarithmic performance for all operations.

That's extremely significant, and much better than a binary search trees.

But the delete operation for Binary Search Trees shows us

the kind of complexity that we can encounter

with working with these kinds of data structures.