So in the previous sections, we talked about how we organize content, how we create, remove, how we access, and how we direct content in uniqs. And in the current section we're going talk about how we can query content more closely. So, let's go back to our example in our directory. So, we're in the plants directory and we have our three directories with files for the peach, pear and apple species. And we also have two files, months and orchard. We will demonstrate in the current section how to sort the files and how to obtain basic information from fields contained in the files. So let's start with sort. The months file which I will use for the purposes of demonstration contains simply a listing of the months of the year. January, February, March, April, May, and so on through December. We can sort the files, this file, using the command sort. Sort months. And this will list the items in the file. April, August, December, February, January, July, and so on. Perhaps you expected to see the names in the order of January, February, March, and so on. However, sort by default sorts alphabetically. So April comes before August before December and so on. We can also sort in a reverse alphabetical order, sort -R reverse months. And that will start in September because S is the last among the letters at the beginning of the words, among those represented here. October, November and so on. So how can we sort them in the order of the calendar year? Let's add a little bit more information to the file. And while we're not going to cover it here, you can use any kind of text editor to edit your file. I'm going to be using vi, vi.months. So let's add one more font that gives the month during the year. From 1 to 12. And let's add the season as well. So March, April, and May we're going to put spring. Summer, fall for September, October, November, and winter again for December. We can sort the file instead of sorting it by the content of an entire line. Which is what we have done, what we did before. We can sort it by column. So we can sort months by column two, which was the numerical column. Okay? So let's sort it by the numerical column, by column number two. So in this case look at column number two, that's going to show, they're going to be listed in the order of 1, 10, 11, 12, 2, 3, 4 and so on. So, you might wonder why? We would have expected to see the order 1, 2, 3, 4 through 12. Well. And that's because again sorting alphabetical order. And 1 comes before 10. Before the string 1.1, 1.2, and 2 and so on. In order to sort by counting numerals, we have to specifically indicate that on the command line. So we're going to indicate that with an N. We're going to say dash k, column two, and N numerically. And now, indeed, the records are sorted in the order in which we expected them. January 1, before 2 February, before March 3, and so on all the way to December 12. We can also sort them in reverse order. So sort -k 2, which is a common number, n, which tells us that the command is numeric, and we should sort numerically. Nr says to sort in reverse order and we can sort them. As you can see, December is the first month, number 12. Followed by November number 11, October number 10, and so on, all the way to January number 1. Let's sort by column number three. And then the three fall months are going to be listed first. Because f, fall, alphabetically is before spring and summer and winter. We might want to sort by multiple columns. So, for instance we might want to sort by the season which would be column three and then within each season. So, we're going to say -k3, and within each season you want to sort the months in their numerical order within the card. So -K LB2 N. So first we have the four, and within the four months we have September, October, November, 9, 10, and 11. Followed by Spring, with March, April, May being 3, 4, and 5. Summer, with June, July, August, 6, 7, 8. Then we have Winter, which shows January, February, and December, so 1, 2, 12. We can also sort those in reverse numerical order. So we're going to have them alphabetically by the season. And then sorting within each season in the reverse order numerical order of the month. So let's see what we get. We have four again and within four we start with November and then October, September, 11, 10, 9. In reverse order. [COUGH] Spring, again, the order is five, four, three. Summer eight, seven, six. August, July, and June. And winter, starting with December number 12. February two, and January one. So that's how we can sort by multiple, multiple keys by multiple fields. One other very useful command here that would allow us to look at particular, at individual fields within the file, is the command cut. The command cut takes one file and it extracts a particular range or of kinds or a particular kind. I'm going to make one small modification to the file mass. And then I'm going to change all the spaces into caps and you'll see why. So the cut command by default delimits the field at the tabs mark. So we can say cut column number one from the file mask. And that's going to give us just the list of of months. We can say cut columns one and two, and we can separate them by a comma. And we're going to see January 1, February 2 and so on. And we can similarly cut a range of columns, so we can cut columns between one and three, so columns one, two, and three, and we'll mark that with a dash. So that's going to show us the entire file. January 1 winter, February 2 winter, and so on all the way to December 12 winter. However, we might have files, or we might want to count by different types of delimiters. So I'm going to go into my mans file, and I'm going to change it back, all the tabs into simple spaces. So now if I wish to cut just the first column from the file months, it will be looking for the tab delimiter. Doesn't find it so the entire line becomes column number one. To specify a different delimiter, we use cut -D and then we specify the character between quotes. And then the number of the column. So now when we specify this, cutting the first column will show us only the month. We can similarly cut the first two columns so now we see the month and the number. It's order within the calendar year or we can show all three columns with one or three. Let's look at another command. And for that particular application, I'm going to cut the third column here of months. And I'm going to put it in a file that's called seasons. So the file seasons, Looks like this. It starts with two winter lines corresponding to the first two months of winter, January February, three spring, three summer, three fall, and another winter. So I might want to know the number of unique types of seasons. So as you can see I have 12 months. I'm going to sort the file and I have the option to say sort -u seasons. So, if there are multiple occurrences of a line, it will only list one. And all of these distinct lines will be listed alphabetically. And indeed we see fall, spring, summer, winter. There's another command that has a similar meaning and that is called uniq. Uniq, however, looks at the file and analyzes all old lines one after the other. So whenever there is a number of continuous lines, that is following one another, of the same type uniq will only replace them with one line. Let's apply that to our seasons file. You might recall that we had two lines that said Winter at the beginning, which have now been replaced with just one line, Winter. Then we had the three spring lines replace with just one spring, summer, fall and then we had December the winter might correspond to December at the end. So you can see that winter now appears twice because it was stated two distinct places within the file. So lets assume now that we would like to use uniq, but we would like now to have a distinct representation of all the four seasons alone. So we double the duplicate of winter. Then we can pipe this. Remember, we talked about pipes in the previous section, so we can redirect the output into sort dash u and that will give us what we expected, fall, spring, summer, winter. There is however, another benefit of the uniq command, which is to tell us how many times the word or lines appears in a certain context. And we can do that with uniq -c. So let's apply uniq -c to our five seasons. In the first column, you'll have the number of consecutive occurrences. And then in the second column you will have the line. So we can see that at the beginning of the file we had two occurrences of the winter line, followed by three lines of spring, three lines of summer, three lines of fall and followed by at the end one line of winter. So that allows us to assess counts, as I said, the number of occurrences. One last comment that I would like to mention to you in querying file is grep. So we might, for instance, want to know if which of the three plans have samples from the root of the plant system? So for that purpose, we can say grep root in all the samples files. And the output will show us the file and then the appearance of the line containing the pattern that they look for. So we see that only Apple and Pear contain samples extracted from the root. Another application might be to look. So we might be looking for instance, for a pattern of where we need to insert spaces and we would delimit that by quotes. So, we want to look for 12 winter, for instance, in our file mice. And indeed it appears there and it gives us the line December 12 winter. Now let's try grep " 7 winter" months. And it couldn't find it in the find months, which is expected. One last option is we can also retrieve the number of the line within the file on which the plant was identified. And that can be done with the option, n. Grep -n " 12 winter" months, will give us the content of the line preceded by the line number. Number 12. So we have looked at the number of commands, particularly sort, uniq, cut, and grep, which allows us to query the content of a file. In the following section we will look at how we can compare content between two files.