Welcome back to the Coursera course, How to apply the multiphase optimization strategy (MOST) in your intervention development research. This is the beginning of Module 4, Some conceptual and technical aspects of the factorial experiment. This video is Lesson 1 titled: The regression model and coding. I'm Linda Collins of the School of Global Public Health, New York University. I will be your narrator for this module and I'm also one of the course developers. The other developer is Kate Guastaferro of the College of Health and Human Development, Penn State. In this lesson, you will learn how to describe the regression model for a factorial experiment and explain the difference between dummy coding and effect coding. But before we begin, I'd like to pause for a moment and remind us why we are emphasizing the factorial experiment. You've seen this figure many times at this point so you know that there are a variety of different experimental designs that may be used in the optimization phase of MOST. It's not possible to cover them all in one course. In this introductory course on MOST, we are emphasizing the factorial experiment because many of the other experimental designs that are helpful in the optimization phase of MOST are rooted in the factorial experiment. In other words, understanding the factorial design is necessary for understanding the fractional factorial design, the SMART, and the micro randomized trial. Let's begin. When we conduct a factorial experiment, it is with the objective of estimating two different types of effects. One type of effect is the main effect of each factor. The main effect of factor A is defined as effect of factor A averaged across all levels of all other factors. The other type of effect is the interaction. The definition of the interaction between factor A and factor B, assuming each factor has two levels, is: 1/2 the effect of factor A at level 1 of factor B minus the effect of factor A at level 2 of factor B averaged across all levels of all other factors. Now, this probably seems very abstract. Here's an easier and more conceptual way to think about it. If the effect of factor A is the same at each level of factor B, then there is no interaction. If the effect of factor A varies depending on what level of factor B you're looking at, then factor A and factor B interact. Data from a factorial experiment are typically analyzed via analysis of variance, usually referred to by the acronym ANOVA. If the model being fit in an ANOVA includes all of the interaction effects, as it typically does, it may contain many interactions. There will be interactions corresponding to any two factors, any three factors, and so on, up to the interaction that involves all the factors. An interaction involving two factors is called a two-way interaction, an interaction involving three factors is called a three-way interaction, and so on. Recall the hypothetical example we've been using in this course. The first three candidate components are motivational interviewing, peer mentoring, and text message support. For illustrative purposes, for now we will pretend these are the only candidate components. Suppose you have conducted a 2^3 factorial experiment to examine the performance of these three components. This table illustrates the experimental design. For example, in experimental condition 1, all three factors are set to no. In experimental condition 4, MI is set to no and PEER and TEXT are each set to yes. There is an observed mean on the outcome corresponding to each of the eight experimental conditions. Let's consider how to analyze the data collected in this experiment. In this course, our perspective is that we would conduct the analysis by performing ANOVA via regression. To do this, we need to specify a regression model to predict Y. In a full ANOVA model for this 2^3 experiment there are: three main effects-- one for MI, one for PEER, and one for TEXT, three two-way interactions-- MI by PEER, MI by TEXT, and PEER by TEXT, and one three-way interaction-- MI by PEER by TEXT. All of these effects can be estimated via multiple regression. This is why a prior course that covered multiple regression is a prerequisite for taking this course. Whenever you perform a regression, you are trying to predict an outcome variable Y. This prediction is called Y-hat. Y-hat is always a linear combination of the predictor variables and regression weights. There is a Y-hat corresponding to each experimental condition. For example, Y-hat_3 is the predicted Y when MI equals no, PEER equals yes, and TEXT equals no. On this slide, we have the regression model corresponding to the analysis of variance for the 2^3 experiment. We are using Betas here to represent regression weights in a general sense not to indicate standardized regression weights. Remember, we said two slides previously that in this example there are three main effects, three two-way interactions, and one three-way interaction. You can see all these effects in this regression model plus an intercept. Let's read the regression model. Y-hat equals Beta_0 (the intercept), plus Beta_1 X_MI plus Beta_2 X_PEER plus Beta_3 X_TEXT plus Beta_4 X_MI by PEER plus Beta_5 X_MI by TEXT plus Beta_6 X_PEER by Text plus Beta_7 X_MI by PEER by TEXT. The Betas correspond to estimates of the intercept, main effects, and interactions. We will talk about that some more very soon. But for right now, let's ask ourselves where do the X's come from? In statistics courses you've learned that sometimes it is necessary to code variables to use them as predictors in a regression. If your background is social or behavioral science, it's likely you learn how to do this using what is usually called dummy coding. With dummy coding you create x-variables by assigning codes of zero or one. The table shows how you would dummy code the main effects for the 2^3 factorial experiment. You can see that for the first factor, MI, there is a zero every place MI is set to no and a one every place MI is set to yes. The same approach is used to code the main effects of PEER and TEXT. There's a different approach to coding that is widely used. In fact, it is the standard in nearly every field except the social and behavioral sciences. This approach is called effect coding. You can see on this slide that with effect coding there's a minus one every place MI is set to no and a plus one every place MI is set to yes. In other words, for the main effects, the only difference between effect coding and dummy coding is that where there is a zero in dummy coding there's a minus one in effect coding. This difference between dummy coding and effect coding holds only for main effects. The differences between them get a bit more complicated with respect to interactions. Let's try to make this a bit more concrete. For the participants in experimental condition 3, MI is set to no, PEER is set to yes, and TEXT is set to no. In the dataset, for an individual in experimental condition 3, their entries for MI, PEER, and TEXT would be minus one, one, and minus one respectively. But of course, you would usually want to include the interactions in the regression model. Here's the complete table including the codes for both the main effects and the interactions. The codes for the interactions are obtained by multiplying the codes for the main effects. For example, consider the codes for the MI by PEER interaction. You obtain these codes by taking the codes for the main effect of MI and multiplying by the codes for the main effect of PEER. You can see for example that for condition 1, minus 1 times minus 1 gives the plus 1 code for the interaction. Looking down to condition 3, minus 1 times 1 gives the minus 1 code for the interaction. Dummy coding and effect coding will produce exactly the same omnibus, that is, overall F, but the main effect and interaction effect estimates and the hypothesis tests associated with those estimates are usually different. Most of the time you will not have to do this coding yourself, it will be done for you by statistical software. I've not included the codes for the intercept in these tables. If you have to do this coding yourself, remember to include the code for the intercept which would simply be a one for all the participants. This is why the intercept is sometimes called the constant. To sum up, to analyze the data from a factorial experiment using a regression model, it's necessary to use numeric codes to represent each main effect and interaction. This can be done using dummy codes that is zeros and ones or effect codes, which for a 2^k experiment would be minus ones and ones. All else being equal, effect coding and dummy coding. as we said before. produce identical omnibus, or overall, F's. However, these two approaches define the main effects and interactions differently. This means hypothesis tests for individual main and interaction effects will be different. In this course, we use only effect coding. In your data analysis, be sure you know which coding is being used. For more about this, see the Kugler, Dziak, and Trail chapter in the Collins and Kugler book. Let's review the regression model again. I hope now you have a better sense of how the regression weights are obtained. Each regression weight, except for the intercept, corresponds to a main effect or interaction. Recall the definition of optimization of an intervention we have been using. Optimization of a multicomponent intervention is the process of identifying an intervention that provides the best expected outcome obtainable within key constraints imposed by the need for affordability, scalability, and/or efficiency. Note that expression, "expected outcome." Maybe you've wondered exactly what we mean by this. This definition actually refers to Y-hats. The Y-hats are estimates of the expected outcome associated with each experimental condition. Let's return to the idea of coding effects-- specifically, to interpreting coded effects. Consider Beta_1 in this regression equation. Through tedious but not difficult algebra, which I'm not going to go through here, you can show for a 2^k experiment that when effect coding is used, Beta_1 can be interpreted as 1/2 of the main effect of MI according to the definition of the main effect we've been using. The analogous interpretation applies to Beta_2 and Beta_3. Now, let's consider Beta_4. Again, through pretty straightforward algebra, you can show that for a 2^k experiment, when effect coding is used, Beta_4 corresponds to 1/2 of the interaction according to the definition of interaction we've been using. But there is an important caveat here. When you get into higher-order interactions, that is interactions involving three or more factors, how you view this depends on what definition of the interaction you use. We're not going to get into this here because it would take a long time to explain it. It is explained in your readings, specifically sections 3.14 and 4.7 in the Collins textbook. In this course, we will always define the interaction in terms of the regression weight. In this lesson, you learned how to describe the regression model for a factorial experiment and how to explain the difference between dummy coding and effect coding. In the next lesson, you will learn how to interpret main effects and interaction effects. Here are the references we cited in this lesson. See you next time.