Today we're going to talk about trees. This is the first second half of the course is the first of a series of lectures on applications. Just a little bit in the way of orientation before we begin for the first half of the class, the first thing we did was introduce analysis of algorithms. But then we have spent the rest of the time surveying the basic methods that we need in terms of mathematics in order to do scientific studies about the performance of algorithms. And then we finished off last time by introducing analytic combinatorics. So this is all about mathematics and about the techniques. Now, we're going to switch to the second half of the class. We're going to switch to applications. And we're going to talk about some basic classical commenatorial classes with lots of applications. And we're going to look at Techniques from analytic combinatorics that we use to study them. Including applications to the analysis of algorithms. And this will bring us to a really basic combinatorial class with all kinds of applications. Trees, permutations, strings and triads, words and mappings. And we'll be using labeled and unlabeled classes. They're alternating with ordinary generating functions and exponential generating functions. Really applying all the mathematical techniques that we learned in the first half of the class. So today, were going to begin with trees. And we'll start off by talking about basic definitions of trees and forests. To begin I'll review what we've talked about in terms of binary trees, by contrast with the general trees that we're going to talk about today. Recall that a binary tree is an external node or an internal node and two binaries tree were the order of the internal nodes is significant. So, at the top is the root and in this case that root has to binary trees or two internal nodes as its children. And down at the bottom, we have some notes called leaves that have both of their children are external. The external nodes are signified with little boxes and the internal nodes are signified with circles. And we talked about binary tree, we've talked about those before. So now we refer to the level or the depth of a node in the tree. Starting with the root at depth zero, the children of the root at depth one, and so on. And the deepest node in the tree, that's what we call the height of the tree. And we'll look at other parameters of trees later on. Now, remember one of the first problems that we looked at in the last couple of lectures is how many binary trees are there with N nodes and that is the axillary numbers. And we did a derivation with the symbolic method that I will just rush through because we have done it so many times, but just to get back on the notation. So we have a generating function, which is the sum over all objects is s to the size of that object, which collects together objects by their size. We have atoms that are internal nodes and external nodes. And then depending on how we set the size of those atoms, that's what we're going to count. In this case, we're counting internal nodes. And then we have a combinatorial construction that comes as a mathematical representation of our explicit definition of what we mean by a binary tree. And then, our basic transfer theorem translates that construction immediately into an equation on the generating function. Which then we can solve and get asymptotic estimates of the coefficients. So that's just a quick review of the symbolic method for binary trees just to set the context for different problems that we're going to do in this lecture. So now in general, the classical mathematical concept of a tree is In terms of forests. So a forest is a set of trees and a tree is a forest with a root added to the top. And there's actually, just from this quick observation, the generating function that enumerates forest is just 1 over Z or the generating function, that enumerates trees, is just Z times the generating function, that enumerates forest. By corresponding to adding the root. So, this is the recursive definition, but these are recursive structures. A forest is a sequence of destroyed trees. A tree is a node, called the root. And it should say, forest is empty or secret to destroying trees. Trees of node, called the root, connect into the roots of trees, in the forest. And so, now a route can have any number of children, including zero. In the leaf nodes of the nodes with zero children. And then the level is the same as before, we start with the root at depth 0, the children of the root at depth 1, and so forth. And the height is, again, the deepest node in the tree. In contrast to binary trees, we don't have external nodes in general trees. So that's, now a question comes up immediately, it's enumeration. How many different forests are there with N nodes? So these are all the forests with one, two, three, and four nodes. And immediately, right away, you can recognize that the catalan numbers are rising yet again. So there's five forests with three nodes. Just three individual nodes or a tree of size two and another node in either order. Or the root node connected to two children or the straight line which is a root connecting to a child connecting to a child. Notice the order of significance, a forest is a sequence of trees and that's consistent with computer applications where we have to represent the thing in the computer some way, we pick a sequence usually. So anyway, the catalan number are here and of course a tree is root connected to a forest, so if you attach roots to all of those, you get all possible trees and it's the same numbers with a shift over by one. So let's looks at the analytic combinatorics for deriving the catalan generating function For trees and forests. So, and we just follow through the basic steps that we've outlined many times before. And we're going to be doing this a lot. But a small mistake leads to the completely wrong answer. So it's good to be careful. So f is the class of all forests. And the size is going to be the number of nodes, where our atoms are now, there is just one kind of node. And it's got generating function Z. And g is the class of all trees on the same atom, and again, the same size function. So those are two combinatorial classes and now we can use our definitions of those classes to develop constructions directly from the definitions. What do we say? We say a forest is a sequence of trees and a tree is a root node in a forest and that immediately corresponds to those two constructions. And then the transfer theorems immediately give us the ogf's for those two commonotorial classes. F(z) = 1 / 1-G(z) and G(z)=zF(z) that's what I referred to on the previous slide. So now if we just solve that we get F(z) substitute in one minus Z F of Z for G of Z and the solve for F, you get F of Z minus Z, F of Z squared equals one. That's exactly identical to the catalan generating function. So it means that the number for a forest with N node is exactly equal to the number of trees with N nodes, which is the catalan numbers, which we know the asymptotics [INAUDIBLE] For. And the number of trees within nodes, the number of forests with N minus 1 nodes, so it's gotta factor for less. So that's a symbolic method that shows that the number of forests and the number of trees is exactly the same. Now from an analytic point of view, we're happy to get the result. But usually, in common when you find that you have two classes that enumerate exactly the same. What you want to find is a bijection showing that a correspondence between every member of one class and every member of the other. And in this case that bijection is important because it gives us a way to represent forests and trees in the computer conveniently. So here's how that bijection goes. Every forest with N nodes corresponds precisely to a binary tree with N nodes. Now, the forest node can have multiple children. A binary tree node can have exactly two children. The way the correspondence goes is that, for every node, we connect it to its left child and its right sibling. So the node at the left of the forest corresponds to the top node in the binary tree, and so forth. That's a one to one correspondence between forests and binary trees. Another way to look at it that's even maybe easier to see, called a rotation correspondence If you take this binary tree [COUGH] and connect it up as if it were the nodes of the forest, you can see that you have the forest kind of rotated. And that's a very useful and easy way to represent forests and general trees within a computer. Because a binary tree is easy to represent in chunks of memory where every node has whatever information's associated and its two children. Whereas in a forest you have to deal with a variable number of children per node which can be inconvenient or difficult in some computing environments. Just as an aside, at this point it's worth thinking about is the idea of an algorithm for drawing binary tress. And I want to show that because I'm going to be showing a lot of binary trees. And the exercise of writing a program for drawing binary tress is a worthwhile exercise for everyone to do. So the most natural approach that we use very often, particularly when talking about sorting and searching algorithms, is to, well first of all, the y coordinate is easy. That's just, well it's the depth, but since they go down, it's the height minus the depth. So we start with the root at the highest and then just subtract one. So we know the y-coordinate of every node. The x-coordinate of the node, the easiest way to set that up is to just do a recursive inorder traversal of the tree. And just assign x coordinates every time you reach a note. And that's corresponding to say the recursive programming says what's a coordinate of a root? It's the number of nodes on the left plus 1 and then the x coordinate of everything on the right side is bigger than that. Then you can see immediately that number one is easy to assign the coordinates, just do an in order a traversal of the tree and number two it's basis the nodes on the tree out nicely. Usually when we draw big trees, we leave big binary trees, we leave out the external nodes. Now, there's a problem with this, is that you get the distracting long edges for some kinds of trees. Particularly, for say binary trees will represent general trees, and You can use a similar algorithm like this for general trees, by the way. But anyway, that's a problem. So what we do sometimes is, take the x coordinate and at every level, we just evenly space the nodes. If there's four nodes at that level, we evenly space them and center them on the root node. That's a useful way to draw trees, because it gives a profile of what the trees, how thick the trees are, how many nodes there are at each level and that's an interesting thing to know about in some kinds of analysis This is what a random binary tree looks like with this idea. And actually, when you see random binary trees, they have many many different shapes. And if we were to do random trees or random forests as well, you see quite a collection of Different and interesting shapes. And so the challenge that we have, that we're going to go into, and the reason that I wanted to draw these big, binary trees, is to make clear what that challenge is. So we have to analytically characterize this in some way. I may be averaging over all of these or whatever it is that we're doing in terms of the analysis, it's going to have to explain this behavior. And of remarkably we were able to get very far in doing that and so that's what we'll start doing now, that just a brief introduction about trees and forests.