Welcome back. This video is about quantitative analysis. Quantitative methods emphasize statistical analysis of user data collected through surveys, through questionnaires, or computer-generated records, that is logs. There are two ways to perform quantitative analysis, hypothesis-driven approach, and a data-driven approach. The differences between these two approaches is when data collection and hypothesis forming happens. In a hypothesis-driven approach, you first have your hypothesis in your mind and then plan how to collect the data. In a data-driven approach, you already have the data available and try to uncover some interesting trends from the existing data. In this video, I'm going to use one example to go through all the steps of quantitative analysis using the hypothesis-driven approach. After this video, I want you to have a basic idea about how to perform a quantitative analysis. First, assume that you have a hypothesis in your mind and you want to approve or disapprove it. Your hypothesis is female users spend more time on Facebook compared to male users. Once you have the hypothesis, the second step is to decide the independent variables and deepened variables. In our case, the independent variable is gender and deepened variable is the time the time people spend on Facebook. After you form the hypothesis and decide the variables, you know what data you want to collect. Basically, you want to have the data about people's gender and about people's time on Facebook. There are two ways you can collect these data. You can create a survey and direct ask people that gender and time they spend on Facebook. We already learned how to design a survey in previous module. Or, if you are the data scientists in Facebook and have access to the user logged data on server end, you can easily collect millions of users data on their gender and their time on Facebook. Here is a data might look like. No matter whether you collect the data from the survey or from the log, you will get a data table similar to this one. There are three columns, participant ID, gender and the average minutes people spend on Facebook per day. Know that these are not real data. I just use this as an example to illustrate how to run the analysis. Once you have a data, the next step is to analyze the data. Today, I'm going to focus on linear regression. Linear regression is a technique to understand the relationship between one or more factors of interest that is independent variables and an outcome or dependent variable. In our example, we want to know the relationship between gender and time. So we want to know how gender influences people's time, the time people spend on Facebook. A little bit history about linear regression. Linear regression particularly multiple regression analysis arose in the biological and behavioral science around 1900. However, multiple linear regression was often computationally intractable in the pre-computer age. This led to the development of more computational simpler model such as Anova, which is especially applicable for planned experiments. Nowadays, with the fast development of a statistic packages and tools, multiple linear regression becomes standards in most social science areas including the field of user research. There are many advantages of linear regression model. First, the form of the relationship is not constrained. Although it is called linear regression, we can actually use linear regression to understand the other types of relationships such as curvilinear relationships. Second, the nature of the research factors expressed as independent variable is not constrained in linear regression. Third, the nature of the dependent variable is also not constrained. You can use any statistical packages such as data, jump or R to perform the linear regression. Here is the output of the linear regression models of France data. This table might look overwhelming to you. So what do all these numbers mean? To interpret it out, we need to do the final step. To fit out the results of linear regressions, you need to understand three seems; p-value, coefficient, and R-square. P-value asks whether a relationship exist or not. A rule of thumb is that if the p-value is smaller than 0.05, you can say that the relationship exists. In our example, p-value can tell us whether the female uses actually spend more time than male users. Coefficients can tell us how strong the relationship is. In our example, it can tell us how big the differences of female users usage time and the male users usage time. Third, R-square can tell us what proportion of the variance is accounted by the variables. In our example, it can tell us the proportion of the variance in Facebook usage time is accounted by gender. Now we can interpret this table. First, let's look at the p-value. The p-value is 0.567, which is much larger than 0.05. This means that the differences between male and female Facebook usage time is not statistically significant. So coefficient here is 16, which means that the differences between male and female users usage time is 16 minutes per day. However, because the p-value is pretty high, the coefficient becomes less meaningful. The R-square is 0.0197, which is about 0.02. This means that only two percent of the variance in the Facebook users time is accounted by gender. Overall, this data does not provide strong support to our hypothesis; female users spend more time on Facebook than male users. Again, this might seem overwhelming for those of you who do not have any experience in statistical analysis. So if you want to know more about linear regression, and you can take a statistic class. A takeaway. Here are the steps to perform a quantitative analysis. Form hypothesis, decide variable, collect data, run statistical analysis. Finally, interpret data and report the results. Thank you for watching this video, hope to see you in the next one.