In this lesson, we will learn how to join two datasets. We may want to join datasets to bring more context to our analysis. For example for the iter is data if we want to add extra context or analysis using information on whether, local events such as games or maybe social media we can bring those variables in our analysis. However while doing so we must again think about the same thing we've been discussing. At what level are the data, and thus if we want to merge the weather data with our ities data both of them must be brought to the same level of unit precisely both need to be brought to the unit of time. The weather data we collected was observed at the date level, thus ities data also have to be aggregated at the data level before we merge the two. For this lesson we have already saved ities data aggregated at the date level as df_ities_date and we have also saved the weather data as df_weather as we see we have 1021 days data in I ties we only kept three columns for illustrative purposes here. Let's quickly have a view of the data set. So, there is this date column there is price cost in quantity all aggregated up to the date. These are the average values for a given date. Let's quickly discuss the idea of joining datasets. For clear exposition, let's assume the ities data is always be in the left and the weather data will always be right, we merge two data frames both of which have a common column known as the primary key. Remember the primary key has to have only unique values in the data sets to be merged. That is the reason why we brought up the ities data to the date level because in the original data, there were many rows representing the same day. Another thing about primary key is there should be no missing values in the primary key. There is no guarantee that both the datasets to be merged have all the matching days or primary keys. This gives rise to four possibilities for the merged dataset. Left join, it has all rows that means all dates from the ities is data. And if those rows are missing from the weather data, they will be filled with na, let's see it in the code here. So, I am doing a left join on the two datasets df_ities_date and df_weather and as discussed earlier ities data is on the left and weather is on the right, and we are going to join them by the primary key that is date. So the left join command is going to create a merged data set such that all the rows in df_ities data are going to be there in the merge data, while if the matching dates are not found in the df_weather data, they will be filled with na values. Now let's use this left_joint command provided by the prior package to create this merge data set. Now this left df_left joint has all the columns from the ities data price cost and quantity and columns from the weather data precipitation, snow, temperature max and temperature mean. Now let's quickly see how many values are missing from each of the columns. As we see there are no missing values for any of the I ties data columns, but there are missing values in the weather data, which is true because we realize that they are must be 12 dates which are there in the ities data but are not available in the weather data. Having discussed the left join, let's understand the right join if we create the right join. It it will have all the rows that is dates from the weather data, which is on the right side. And if those rows are missing from the ities data, they will be filled with any values. Let's see the code for doing the right join. Here I am using the right join function from the deep prior package that we just loaded, and the ities is still on the left and weather is on the right and we are again joining them by this primary key called date. Let's see the nature of this data set. Well, it has 2138 rows which match with the number of rows in the weather data set make sense. But there are a lot of missing values here and that's what the idea of right join is, that the data frame on the right will have all of its rows in the merge data set. And if those dates are missing from the data set on the left side, they will be filled with na. Let's quickly see do we have any values and which columns do they belong to? So as we see there are no missing values in the weather data set columns, but there are these missing values from the ities data, which means that there are these 1129 rows in the weather data that are not available in the ities data. And that's why na values have been filled in these ities rows. Now let's see what is a full join, a full join will all rows that are either in the ities data or the weather data. That means either in the left data frame or in the right data frame and if there are rows that are missing from one data set, but which are available in the other data set they will be filled with na's. Let's see this in the code. So here I am using this full_join, a deep layer package function again and same data frame ities on the left weather on the right and primary key date, and if we see that this data set has the highest number of observations because it is capturing all the rows from either of the data sets. Let's quickly investigates the missing values here. And as you would have expected the missing values in the weather data set are 12, because we are earlier saw that there are 12 rows in the ities data that do not have their corresponding rows in the weather data. And there are these 1829 rows which are in the weather data, but not available in the ities data. Finally, let's understand the inner join. Full join captures the union of the two datasets, and inner join captures, the intersection of the two data sets. The idea is very simple. It has only those rows that are available in both the I ties and weather data, same command except the full is replaced by inner here. Everything else remains same let's just see this data frame and you would see it has the least number of observations. Why, because it is going to create an intersection which is usually smaller than either of the group's. If we see the missing values, we will find that there are no missing values because if there is a data in the weather data frame and is not available in the ities data frame, it will just be dropped and vice a versa. So in this lesson we discussed how we can merge our data sets in different ways, and remember bringing data sets together can add more context to our analysis.