In this video, we're going to get deeper into the Pandas library and find out how we can do filtering and sorting operations and finally how we can do statistics on a Pandas DataFrame. So, just as by way of introduction, a Pandas DataFrame is a more sophisticated data structure than we've seen before. So, previously, we've seen Lists and NumPy arrays. As we move from Lists to NumPy arrays, we found out that we gained a bunch of functionality and in the same way, when we moved from NumPy arrays into Pandas DataFrames, we again gain a bunch of functionality. So, let's go in and see if we can write a simple sorting function for our DataFrame here. So, what we've got here is some code which basically pulls in some dataset from a CSV file. So, I'm going to run that now. Then, if I just type in data into my Python notebook and run that, you'll see that it comes up with this rather nice view of the data as a kind of table. So, we just examine that a bit. It's very much like a spreadsheet. So, we've got a set of columns and with each the different variables in the dataset. Then we've got a set of rows where each row represents a data point in the dataset. So, yeah, it's familiar. It's a tabular data structure. It's similar to a NumPy array but with the additional element that we've got all types of different data in there and we've got headings for each of the columns. So, it's a bit more sophisticated and I'm going to show you now that we can do more stuff as well. So, what we're going to do first is see if we can sort all of these data points by the average income. So, how do we do it? I'm going to take the data and I just do data, sort values, and I need to tell it what I want to sort by. So, I want to sort by average income, and this all the tab completion is working quite nicely there, so, don't forget to use that! Then, I'm going to do a little trick here, which is, I want the actual data to be affected. So, I want my DataFrame to change as a result of sorting it whereas, otherwise, it will just return the sorted version. But I want to sort the actual data itself. So, I'm going to pass this argument here ‘inplace’ and set it to True. So, that means the data itself is going to be affected and we're not just going to get a copy of it. Then let's print out again. Okay. So we run that, and you can see, now if we go to average income, that it's going up from 572 all the way up through to much higher numbers. So, that's it. I've done my sorting there, and of course you can sort by any of the columns and do it in place and it will sort the data itself. The next thing I want to do is show you how I can do a filter operation. So, let's say, I wanted to select all of the countries in the list that have an income greater than 15,000. So, say, I wanted to do some data processing just on the higher income group to see if there's anything interesting going on there. How am I going to do that? I can do this. I just say, richest and I'm going to say data. What I do is I now specify a filter. So, I say, that's specifying here, I'm looking in the DataFrame and I'm going to specify a filter here. The filter is that the average income is greater than 15,000. Okay? So, there's my filter and I'm applying the filter within this data array here. So, it's a slightly unusual syntax but if you've used something like R before, you might be familiar with this style of filter. But anyway, then at the end we'll print out richest in that nice table display. So, let's run that. You can see now that I've only got my higher income elements here, and of course they're still sorted because I did that sort originally. Let's say I want to pull something out of that dataset. So, who is the richest? Well, so the lowest income in the rich set is going to be richest.iloc[0]. So, this is how we kind of pull out a row, if you like. So, let's just do that. You can see if I pull up iloc[0], I get Italy which, in that set of rich countries, is the one with the lowest average income. If I wanted to, say, find the highest, I can just do a minus one there — minus one as an array index, always gives me the last thing in the array. So, I don't have to worry about how long the array is, I can just say, minus one and it will give me that one. You can see, I've got here, Luxembourg with their gigantic average income of 26,000. So, that's where I'm going to be moving later. What else can we do with these indexes? Well, we can actually specify a range. So, let's say, I wanted the first few. You can see what I've pulled out there is the first five of that set by saying [0:5] and I can do various other things with these indexes. So, that's how we can do sorting and filtering. What I want to do next is show you how we can actually do statistics using the NumPy functionality in our DataFrame. So, let's dive in and do that. The final thing is statistics. So, let's say, I wanted to find, what's the mean of the richest? So, richest. I can just do that with NumPy. First of all I'll import NumPy. Import NumPy as np and then I'm going to say, numpy.mean(richest). But it's not just the mean of the richest, (not that rich people are mean...) It's just pulling out the richest value ‘avg_income’ like that. Okay? So, what I'm saying is, calculate the mean of the average income in that rich dataset. We can print that one out. So, the mean in that set of countries which are richer than 15,000 is 19,000 and I could compare that to the mean, for example, compare that to the mean of the whole dataset which would be np.mean(data), sorry, the full dataset np.mean(data['avg_income']). Then, I can print those out. So, I'll print out all_mean, rich_mean, like that. Okay. So, there we have our result, so we've got, this is the mean of the whole dataset, So, 6,000 and this is the mean of the people who are over 15,000 in average income. So, there we have it. I've just dug in a little bit into the DataFrame object from Pandas and showing you how you can apply sorting operations, filtering operations and then also statistical calculations from the NumPy library. So, you can see we have now moved from variables through two Lists onto NumPy arrays and finally on to DataFrames. We now have a really powerful data structure here to work with when we're doing our data processing.