In this section, we will continue a list of exercises, or practical applications, and this time we'll look at questions, that relate information across the species. Across the three species that we have represented in our example. So we are in our client's directory, and we would like to answer the following questions. What plant systems contain a Smell gene? Or any other main gene? The second question would be, what genes are in common, between toy apple, and toy pear? And what genes are specific to each? So let's try to answer these two, or rather three, questions. You might recall that information about the particular gene name, is stored in the species, that's a star .genes file for each species. So to determine whether the smell gene is present, in the two species all we have to do is to apply a GREP operation. GREP. We're grepping from the word, we're looking for the word smell, and it has to be capitalized. And were looking for that in all directories, in the files that have the extension genes. And then the output then shows us, that it is present in apple, and that it's also present in peach. And that it has, just for the information, so it has two variants. Looks like I got one. It has two variants in apple, and it has three variants in peach. And now we notice that just by mistake, it also retrieved the line corresponding to the gene shape. For which smell one was listed as one variant. So the GREB command, will give us all the lines that contain somewhere within the line, the word smell. So we identified that smell, the gene smell, appears in apple and peach. That's for the first question. Let's look at the other two questions. What genes are in common between, toy apple and toy pear? And there are several ways in which we can do this. The information that we're interested in, is clearly in the genes file. And more precisely in the first column, which is the gene name. So first, let's make some simpler files that contain only the genes listed once, for each of those two species. So we'll type cut, if you remember, -F1 because we have tab delimited files. Apple, so apple.genes, and this operation will retrieve the first column, so the names of the genes, each one of them listed, the same ones for each variant that it has. You might recall that to obtain simply the list of names, we can sort those uniquely. And we can create them, we can redirect the output into a file that's simply called applegenes in the current directory. Let's do the same, with the pear genes. I simply modify the above command line, but you can type cut-f1 pear/pear.genes, pipe that to sort -u, and save them in peargenes. And now there are two ways in which we can compare them. A simple comparison would be to apply the command COMM. Notice here, that we specifically sorted the files which was one of the requirements for applying the command COMM. So, now. If we're listing COMM, and we want to see what genes are in common, then we want to ignore the genes, that are unique to the first file apple genes. And also ignore the genes that are unique, the lines that are unique to the second file, pear genes. Followed by the names of the files. And this gives us color, shape, size and paste, as being the four genes that are shared by the two species. So that's four. Another way to obtain this information, will be to simply notice that each gene is listed once. So if we concatenate the two files, apple genes and pear genes, we now have each gene listed once for every species. We can simply sort the file to bring the gene names together. So, now color shows up together one for apple, one time for apple, and one time for pear. Shapes shows up in both species and so on. Whereas, articles such as, genes such as apple10, apple4, appear only one time or pyres. Pyr1, just one time. So now that they are sorted, we can use uniq-c, just as you've used before, to know how many times, the line appears in this list at one particular location. And you'll see that color appears two times, which means it was present in both directories. Shape, again, was present in both apple and pear. Whereas smell only appeared in one. And now finally, looking for the pattern of two, so quotes space 2, and piping through more, will give us only those genes that appear in both, that have two occurrences in our list. And that, what you see here would tell us that there are four of them. Let's go to the next question, or sub question, which is what genes are specifically to each? So back to our COMM command. You might recall that we can play, we can choose specific command line options, that can select or reject certain combinations, or that can print some combinations. So for instance, if we want genes that are specific to the apple species. We would say COMM, ignore all the lines that appear only in the second, in the pear file. Ignore all the lines that are in common to both files. And then we give that the list of files, apple genes and pear genes. Pear genes. And this list smell apple10, apple4, 5, 8 and 9 as appearing specifically in apple, so those are apple specific genes. Conversely we can ignore all the entries, that appear only in the apple file in the first file, and all the entries that appear in both files, which will give us all the pear specific genes. And those are pyr1, pyr2, 3, and 4. So I've demonstrated some ways, we can obtain basic information about genomic data, by using the command line arguments, that the basic UNIX commands that we've illustrated before. You have a more extensive list of exercises at the end of this chapter. At the end of this chapter. I encourage you apply them to each of the systems, apple, pear and peach, and then to look at the last set of questions, that apply across the three systems. Additional exercises will be provided in the on-line supplement. Thank you very much.