So we now focus on the Eulerian paths problem and try to learn more about de Bruijin graphs. So, we saw that to reconstruct the genome, we need to find an Eulerian path in this graph. But some of you noticed that I cheated. I constructed the de Bruijn graph from genome, but in real life, genome is unknown. The only thing that is known is reads or k-mers. Can we reconstruct the de Bruijn graph from genome composition rather than from genome? Well, let's try to do this. But let's first review what we have done. What we have shown is that given a genome, we can construct the de Brujn graph. What we want to do is, given genome composition, or k-mer composition of the genome, we want to construct the genome. What we will show now is that given k-mer composition, we can construct de Bruijn graph and then using Euler's insight to construct Eulerian paths in this graph. Let's start. So in the past, we started from a path representing the genome. But now the genome is unknown. What is known is only composition of the genome And we will represent this composition, every element, every 3-mer in the composition by an edge. We will now construct also nodes of this graph by representing them as 2-mers as we did before. Now, to construct the Bruijn graph from k-mer composition, let's do exactly the same we did before. Let's just glue identically labeled nodes. For example, AA and AA identically labeled, let's glue them together. Bring them a little bit closer, and finally glue. Let's continue with AT, the same thing we see continue, continue, continue, continue, continue, continue, continue, continue again. More, more and finally they're getting the same path that we were getting as if genome would be known. What we haven't done yet is gluing because they are, of course, other vertices to glue. And we proceed exactly the same way as we were proceeding in the case when genome was known and finally construct the de Brujin graph. So what we have just shown that even in the case genome is unknown, we can construct the same de Bruijin graph as in the case the genome is known, and this is great. Which means that the next step we need to address is how to find Eulerian paths in the graph. So, for construction of de Bruijin graphs, the algorithm is very simple. We simply need to represent every K-mer as an edge between its prefix and its suffix, and afterward glue all nodes with identical labels. It is a 2-line algorithm. If you implemented that you have already figured out that there is a different way to look at de Bruijn graph. If you are given the collection of k-mers, we can form nodes of the graph from all (k-1)-mers from this collection, and for each k-mer from the composition, we can connect its prefix nodes with its suffix node by an edge. This is our two equivalent definitions of the de Bruijin graphs. And now let me explain what was de Bruijn's contribution to this area and why these graphs are called de Bruijn graphs. De Bruijn graphs have great applications of biology, in biology. But de Bruijn of course did not know this, because he was interested in purely mathematical abstract problems. He tried to solve so-called universal string problem, which is finding a circular string containing each binary k-mer exactly once. Here all eight binary k-mers. Can we find the strings that contains each of them exactly once? We can. This is the string, and if you check for each binary 3-mer, you will find it appearing in the string along the circle if you travel around the circle clockwise. For example, for this 3-mer, it appears along the string. But how can we construct the string? Is it even possible for all possible values of k? Well, the only thing we need to do, we need to represent each of these binary k-mers as an edge. Right? Because it should be a k-mer composition of the string we are trying to find. Afterwards we need to construct the Bruijn graph from this isolated edges and then the Bruijn graph will look like this. And afterwards, simply look for Eurelian cycles in this graph. Cycles rather than paths because de Bruijn was interested in finding circular strings. Can you find an Eurelian cycle in this graph? And how can it help us to find universal string? Let's see. Let's start working here, and we immediately can get insight into the first three elements of the string. Let's continue this way, and when we travel through the graph, the string keeps enlarging and enlarging and enlarging, and finally, we constructed the universal string. The fact that the graph is Eulerian implies that every k-mer appears exactly once in this circular string. But what about this four universal string constructed from all 4-mers. What about ten? All 10-mers in this case, the graph will consist of thousands nodes. Does it have a Eulerian cycle? If yes, how can we find it? We'll answer this question in the next segment.