In this practical, we're going to construct a de Bruijn graph from a string. So, I'm going to create a function here, de_bruijn_ize, which will take a string from which I construct a graph and a k-mer K that we want to use. And so, if you recall from lecture what we want to do is look at each k-mer in this string, take the two k minus 1-mers in that k-mer, and create an edge between those two k minus 1-mers, and create nodes for them if they don't exist. So I'm going to create my list of edges and a set of nodes that will represent this graph, and I'm going to loop through every k-mer in the stream. So i is going to range from zero up to the last position where a k-mer could start and so for this k-mer now, I want to add to edges. I'm going to represent this edge as a tuple, so it will be the k minus 1-mer starting at i and going up to i + k -1. And then the other k-mer will be starting at i + 1 and going up to i + k. >> So if the left and the right k- 1 works. >> Mm-hm. And now we just have to add both of these to our set of nodes. So I add the first k minus 1-mer. And the second k minus 1-mer. >> Because you're using a Python set. If you call add more than one time with the same string, it'll only add it once. >> Right. And then we can just return our set of nodes and edges. So this is our de_bruijn_ize function. So let's run this on a simple DNA sequence. I'll use a k-mer length of 3. There you go. Now let's print out our set of nodes. >> Okay, we've got AC. We've got CG. We've got GC. You got GT. Looks like we got all of them. >> Yeah, and so since our kmer length is 3, every k minus 1-mer in this string should appear in our set of nodes, and it looks like they're all there. And now we can print out our set of edges, and so we have an edge from AC to CG, that's from the first kmer. CG to GC is the second k-mer, and so on for each k-mer in that. >> Yep. In fact those edges in that order are the Eulerian walk for this de Bruijn graph. >> Right. So, I'm going to paste in here a function that we'll use to visualize this graph. All this function does is you can see the first line, it calls our function de_bruijn_ize, and then it prints out the nodes and edges in a special string that we can pass the dot, so that we can visualize this with the dot. >> So now I'm going to load a gvmagic. >> So, this is a special IPython plugin that lets us draw graphs basically. >> And then I'll call dot with our function, visualize_de_bruijn, and I'll just paste in the stringing k-mer from above. And here's our graph. So you can see, we start with AC, and this goes to CG for the first k-mer. Second k-mer, we start with CG and then go to GC, and back to CG, and then to GT and then TC, and then CG and then we're done. >> Yep. So you can easily see the Eulerian walk when you print out this nice small graph. >> Yep