0:04

Now, we're going to finish up by looking at some extra operations that are

supported for string keys when we use Tries.

So, we're going to look at a couple of character based operations that can, that

are quite useful and, and can be supported by the string symbol table API with tries.

One important one is a prefix match. So that's given a prefix, like say sh,

What we want is the method to return all the keys.

Let's start with that prefix. In this case, it'll be shells, shore and

she. Another one is so-called wild card match.

So that's when we don't specify one of the characters, or multiple characters.

So we put a dot to say we'll take any key that matches the other characters and we

don't care what that one is. So he. would match she and the.

And another one is so-called longest prefix.

So now we've got a long key and we want to find the best match that's in our symbol

table that matches that key. The key in the symbol table that has the

longest match with our key. So going, in this case, the ones that's

the longest prefix of shells sort is shells.

Later on we're going to see applications of some of these and the important idea

for now is that these are easy to articulate.

And not only that, they're easy to implement with tries.

Nts tr, and Turner research tree. So, we're going to take our standard API

and add these four methods. So, while there's keys that returns all

keys. And, and, as is our usual practice.

When we have a lot of things to return, we're going to return'em as a, as an

iterable. So the clients can just, iterate through

the what's returned. And that's usually what clients want to

do. And then we have keys with prefix.

Give a string s, and the iterable will return all the keys in the symbol table

that have s as a prefix. If the client wants to find the associated

values, they can do gets to get the values.

And then we have keys that match so that's where we allowed dot to be a wild card, in

S, and we want to return all the keys that, match, S, taking dot to be a wild

card. And then longest prefix of S is the key in

there that's the longest prefix of S. We're gonna see later on.

This particular one plays, an important, role in a, in a more complicated

algorithm. We could also go ahead and add all the

ordered symbol table methods that we considered when we looked at binary search

trees like floor, and, and rank. And, we're not going to take time to do

that right now. It's very straightforward.

3:30

Look at for these implementations is ordered iteration.

And this is trie-traversal, so we are going to do an in-order traversal of the

trie, that is, visit the left, now visit the middle, visit the right.

In that traversal, we can maintain the sequence of characters on the path from

the root to the current node just by adding an, adding a character when we go

down and removing a character when we go up.

And when we encounter values, Then we can put out this, the characters

that we have, and that's the way to pull out all the keys in the trie.

So, for example, in this case we go down and we have b, and then we hit the y's.

So we can output,. Say onto a queue we can output by.

And then we go back, we have s, e and a. And, we find a value, so we go ahead and

put out the a. Now we drop the a and do the l, and that's

the l, l, l, l,. and then we get to that s, we have a value we can put ourselves.

And then we can drop these, get back to the s, and get the sh.

She, finite value, and put, put the she in the queue.

Lls and put shells on the queue. Drop all those values down to the h, and

get us sho, shor, shore, Find the value and drop shore onto the

queue. And then finally, t, th, the and, find a

value and put the, down the queue. So it's easy to, very easy to keep track

of the sequence of characters on the path from the rout to every node and every node

check for value and programmer key. That's a ordered iteration and

implementations of these extra operations are just based on multiplying ordered

iteration. So this is a implementation of keys with,

which is essentially of the operation that I was tracing on the last slide.

So if client wants keys, we make a queue. Then we call a recursive routine that

collects all the keys in the try, starting at that node.

And, given a null string, which is, the, sequence of characters on the path to the

given node. When that recursive routine, returns, then

we just return the queue. And the recursive routine does encode what

I said in words, most of the time. This is for a try.

You can do the same thing for a Turner research tree with just a little more

code. So for every character we just move down

the trie for that character add the character to the prefix and also pass the

queue along. If we get a null value, we put what we

have on the queue when we get null when we return. So a very simple implementation of

ordered iteration of the recursive. So that's the implementation of the keys

method. So prefix matches an other things like

that going to work. Are going to work, just by modifying that,

and you're familiar with, prefix matches, When you type, nowadays, in your browser,

a search, function, you're getting, a prefix match of all the things that you

typed or other people have typed, that, matches those strings.

And you find that also nowadays when searching in your address books.

It's a quite, common function nowadays. So let's look at how that looks, in an

R-way trie. So what about finding all the keys in a

symbol table that start with a given, prefix.

Well, we just search for that prefix. And then, then just do a collect at the

keys in that sub tri. So that's, very straightforward.

So you get the node. That is the one you get to by starting

with that prefix and then you just call the recursive collect and boom you're

done. Extremely simple, implementation of keys

with a prefix in an R-way trie.. What about longest prefix?

And here's one that, is, is, very, actually very heavily used, in the

internet. If your query strings are IP addresses on

the internet and you have some given destination IP address, the router has a

lot of IP addresses in a routing table. An it wants to choose the one that will

get you as far as close to the destination as you can.

So it's going to choose the longest prefix.

So if you want to get to 128 by 112, you can think of these as hops on the way to

where you want to get, where eleven is the final destination.

And essentially this table says, it knows how to get this far and so that's what it

wants is the longest prefix. The longest prefix matches this one. Well,

it doesn't know how to get any further than 112 and this other one only to 128.

I, and this operation gets performed extremely often on the internet nowadays.

Just a quick note, It's not the same as the floor function

and it actually is a kind of a string operation.

These things actually, usually on the internet, they're not represented as

strings. They're represented as binary numbers.

But in machine or assembly language implementation tries are even easier.

You can just take a bunch of bits and use them as index into a table to move down

the try. And actually tries are pretty old because

so easy to implement data structures in that way in the past.

You'd be surprised at how much of our computational infrastructure is built by

Program or consists of programs that are written in machine or assembly language

that can make use, efficient, really efficient use of low level representations

like this. But it's also useful in higher-level

languages. So what does it look like in an R-way

trie.. Well, all we're going to do is just do a

search and then we're going to keep track of the longest key that we encountered.

So, we have a path. And on that path there is the most

recently seen value that's the longest key that we found If we end that at no link

that's fine. It's the, if there is a value on that node

that's kind of a no link that's the value we're going to return.

So that's a very straightforward implementation.

Just keeping track of the longest key encountered on the search for our key.

That's implementation of longest prefix of and the usual set up of, of making

recursive calls. And this code is quite straightforward.

And application of this one and so-called T9 texting.

And now you know, when we do a course like this, we try to keep up with modern

technology. And modern technology is moving almost as

fast as we can. There are lots of young people who really

don't know about texting with keypads anymore.

But there's a certain range of, you know, five or ten years where people got

extremely adept at doing so called multi-tap input, where the only keys on

the phone were the nine keys to dial numbers.

And then you had three letters associated with each number,

Like on old dial telephones. And to enter a letter you had to like to

enter an H you had to tap twice, cuz that's the second letter on the four key

you had to tap four twice or something like that.

So, The so-called T9 text input would use the

kinds of algorithms that we're talking about to make it so that maybe you didn't

have to do multi-tap. So rather than type two 4's, you would do

a tries type search to figure out which word that you typed.

And that now looked, It'll look good to us as a potential

application for a while. Glad we didn't spend too much time on it.

If you study this keyboard, maybe you'll see, see why.

Well you think about how to implement it. But once we get into it, we realize, Kevin

realized there's no S in this sample keypad, what happened to the S?

So what are we going to implement because even if there's no S there, so what

exactly are they doing and well maybe this is a bit of a fantasy that this is the

response that, maybe these people lived in a world with no S's.

Mmm. Well anyway, we've moved on from tries

from, from that type of, way of entering text.

But I still want to mention a few more ideas because the basis behind tries and

Turner research trees is still out there and is still really an important part of

our infrastructure. And there's some really great algorithms

that we just don't have time to cover. Now one of them's an old algorithm called

Patricia. And, This one, is a really interesting and

intricate algorithm. Particularly when implemented for binary

tries, where you just do a bit at a time. And people, again, implemented this kind

of algorithm in machine language. And got extremely efficient performance.

If we cast it in, in, our way tries that we've talked about it's really the best

way to think about it is, a way to remove one way branching.

It seems wasteful, to have, all these nodes that just have one branch.

And so one of the main ideas behind Patricia there's others that, don't

really, show up, in the level, high level representation we're using.

But one of the main ideas behind Patricia was rather than associate a character with

each node, Associated sequence of characters with

each node, so you just don't have any one way branching.

Implementing this is maybe one step beyond this course but maybe it's within what we

could do in this course. Where just not going to take the time to

do so. And you'll find implementations that in

practice avoid the one-way branching that are used in many, many applications,

performance critical applications for searching nowadays, I already mentioned IP

routing tables. There's probably, you know, no piece of

code that's executed more often than that one.

That's based on a trie type algorithm. And we have these other applications

listed as well. It's got some other names too.

Another thing, is, so called, suffix tree. So, that's, building a tree from a suffix

table. So, we talked about having, for suffix

sorting, we talk about applications. You can also build a search structure from

suffixes of a string. That admits, all kinds of, fast string

processing application. And again, usually, eliminate on way

branching in suffix trees, and also amazing, amazingly you can get'em

constructed in linear time. And there's all kinds of interesting

applications of suffix trees probably the most important now a days are in

computation biology databases. Again extensions of the kinds of

algorithms that we've talked about today. So I think the bottom line for considering

string search trees is that it's a real success story in algorithm design and

analysis. It's a number of clever algorithms that

really have made a difference in the kinds of operations that we can perform in the

amount of data that we can handle in modern applications.

We, we started with red-black BST's which is a pretty good solution for a general

symbol table. And also hash tables, which are also

widely used. But with tries, and Turner research tries

we have a performance guarantee where we'll only have to really access log-in

characters. And when you think about that, that's,

even when N is huge, that's going to be a pretty small number.

Say we're looking among billions, billions of things log in, even in you only have

two wave branching would be 30. And when we have 256 way branching, it's

way, way smaller. And that's just the number of, so it's the

number of characters accessed. And the, really the bottom line is, if

you, you can set things up nowadays even on the internet when there's huge, huge

amounts of data out there you can set things up so that you can only need to

look at maybe 100 bits to get at anything. We think about 100 bits, that's specifies

two to the 100th of possibilities. And two to the 100th is a huge number.

There are not two to the 100th pieces of information, even on the internet, even

will ever exist on the internet in the. Into this galaxy.

But with just 100 bits, which really isn't too much, we can search for anything

efficiently. And that's an amazing success story for

algorithm design and analysis. So that completes our look at tries and

Turner research tries.