We're starting here where we left off in the last video with a blank plot of seniority against the number of bills passed by members of the 115th Congress. Now, we're going to go ahead and add points to our scatterplot. Remember that ggplot calls these kinds of marks that go into the figure geoms, and these are one of the ingredients of ggplot figure. You're going to find out that there are many geom commands like points, bars, boxplots, lines and many more. You can find an inventory of all the different geoms at the GT plot reference page and ggplot2.tidyverse.org/reference. You really want to acquaint yourself with these reference materials because you're going to have to come back to them over and over again as you learn all the ends and outs of ggplot beyond what we can show you in the walkthroughs individual into videos. For a scatter plot, we're going to use geom_point. And we add a geom to a ggplot by stringing the two functions together with a plus sign. We don't need to put anything in these parentheses right now because the geom point function inherits the data anesthetic mapping from the original ggplot command. That is, that it knows it wants to use the data from the ggplot command that preceded it along with the aesthetic mappings in the gg function as well. I will point out that there are commands, there are geoms that you can add to a ggplot string where you are going to put different aesthetic mappings and data in there. But the default is that it will inherit the data from the ggplot command, and so that's what we're going to do to start out with. So let's go ahead and run these lines of code here, the two parts of the ggplot command. And you see that now we've generated a scatter plot. The points here are members of Congress, so their individual rows of the data table, one row per member of the 115th Congress. And the position of the points on the plot comes from how senior member is on the x axis, that is how long they've been in Congress. And on the y axis, how many bills he or she passed out of Congress in the 115th session of Congress. Now, one problem with this chart is that there's an issue with over plotting, which means that there are cases in the data where there's more than one member that has exactly the same x and y coordinate value. So exactly the same seniority and then the same number of bills passed. For instance, if I had to members of Congress and they both had a seniority value of 6, and they passed 3 bills, then those two points would be right on top of each other in the figure. To help represent this and not have these points over plotted on top of each other, we're going to change the command slightly. So rather than use the geom point geom function, we'll use the geom jitter function instead. And when you use geom jitter, what you're doing is you're adding a little bit of random noise to the x and y values of the data, so the plots won't be exactly on top of each other. Obviously, I've been pretty careful about this, because your plot doesn't exactly display now the underlying data. But for visualization purposes, this is a useful trick, and you're going to encounter this kind of thing from time to time as you do data visualization. You want to maximize the reader's ability to interpret the data while minimizing any kind of deceptive or manipulative practice. It's important that you make smart decisions about this so that your reader understands the point you're trying to make, but also that you're not deceiving or misrepresenting the underlying information basic scatterplot made. We can do a couple of things to improve the look at the figure right away. Like giving a title and changing the axis labels. We can do this by adding a labs function to the string of ggplot commands, connecting it to the ggplot and geom jitter functions with another plus sign. Now, you can see the basic elements of the ggplot figure, but there's still more to do. For instance, we haven't included any color in the figure yet, and we can take advantage of this by coloring the individual points differently for the two parties in Congress, Democrats and Republicans. So let's go ahead and add that to the data and visualize it. First, we go back and modify the data, wrangling to also grab the dim column from our original data. And now we can modify the ggplot command to have color be mapped onto that dim variable. The dim variable is just a series of 1s and 0s which indicate whether or not each member is a Democrat or Republican. So when we map the color to that, the colors will be different for values of 1 and values at 0 in the data. So we'll go ahead and run this, and it looks like we sort of got what we wanted, but it's not quite right. We see that we have this continuous scale for colors in the legend where the dark is 0 and the lighter blue is 1. But that doesn't really make sense, because we know that the United States members of Congress with a very few exceptions are either Democrats or Republicans. And this is a categorical variable, not a continuous one. So to get closer what we want, let's do a little data wrangling again, and we'll convert the dim variable from a number to a categorical variable. We'll go ahead and use recode and then add that recoded variable back to our data. This kind of data wrangling should be review for you by now, but if you do need to clarify how these commands work, you can go back and check the tidyverse reference materials. So now, I will go ahead and plot this with the recoded data an it looks better but we still don't have quite control of the color yet. There's been an automatic color that's been assigned for us. We really want these points to be blue and red because those are the colors that are associated with the parties in Congress in the United States today. We can manually control the colors and figure by adding the function scale color manual. And we set these values to blue and red. So we go ahead and plot this again and now we get the typical colors for Democrat and Republican. Now, not that I'm not showing you the details of every single one of these functions right now, I'm just trying to orient you again to the process of making a ggplot figure and adding all these functions and strings, so that you can manipulate the different elements of the figure. So one final thing here, let's say that we don't want to have Democrats and Republicans on the same plot, but we want to have two separate plots, one for Democrats and one for Republicans. We can do this by what is called fasting the figures or making separate figures based on some kind of criteria that we select. So as you might expect, we add another ggplot command here and the function is facet_wrap. And the facet_wrap command works by putting inside the parentheses a totally mark followed by a variable in the data that will split the figure into two parts on the basis of that variable. Our boldness put the plot for these discrete values, and will do it here for Democrats and Republicans by putting tilde and then that variable for the parties. And now we have two scatter plots as we would expect. This demonstration of building up a figure shows you the basic logic for how to build a ggplot figure. All figures use the same basic process, so it doesn't matter if you're doing a scatter plot, a line chart, bar plot, an even further down the road, and more exotic things, things like maps. So this is a good foundation for your future experimentation. We start with getting the data how we want it and then we systematically build up the graphical elements of the visualization one by one. In the next several videos, I'll do more demonstrations of how to make some of the classic types of visualizations using this exact same methodology.