[MUSIC] In the previous lecture, we distinguish between two families of effects sizes. Going D Family on the one hand and R Family on the other side. In this lecture we'll take a look at Cohen's D measures of effect sizes. Standardized mean, differences. Let's start with a practical example. Do movie ratings differ between websites? And if so, how much? Does it actually matter where you go on the internet to see how popular a movie is? Now this is my favorite movie of all time, Fight Club. I would definitely rate it a ten, myself, but I can understand that taste differs between people, so there might be some variation. And there might be a slightly lower average score than ten. Let's take a look at two different websites to see how people have rated this movie. If we go to the Internet Movie Database, we see that the score is a remarkably low 8.9, okay. But if you visit Rotten Tomatoes, we see that the score that people gave Fight Club is actually 7.3 out of 10. This is actually a big difference if you look at it like this but we need to quantify this difference to be sure. Is this really meaningful or not? So let's calculate the standardized mean difference to find out. Now we have the two mean values. On the one hand, the Internet Movie Database, where the evaluation for Fight Club is an 8.9, and on Rotten Tomatoes the average evaluation is a 7.3. Now conceptually, the standardized mean difference is the difference in means divided by the standard deviation. So here we have two means, and we can subtract the one from the other to get the mean difference, but we also need to know the standard deviation. Now luckily, I happen to know that if you look at all movie evaluations that are being done, the standard deviation is, more or less, 1.4 on these scales. If you have 10 point scale then the standard deviation you can expect if people evaluate movies 1.4. So we can have the mean difference and divide it by the standard deviation to calculate the effect size. In this case, the Cohen's d is a 1.14 which as we'll see later is a very big effect. So it really matters which website you go to, if you want to know whether people like a movie. A Cohen's d ranges from 0, no effect, to infinity. When there's no difference between two groups, the mean difference is 0, and you can divide it by any standard deviation you want. The effect size will remain zero. If the difference is really really huge then the effect size just goes up and up. Now let's visualize different effect sizes. If you don't have any idea of how to interpret an effect size, then there are useful benchmarks you can use. And in this effect size you can use the benchmarks of 0.2 for small effects. 0.5 for medium effect, and 0.8 for a large effect. In this graph, we see the visualization for an effect size of 0.2. So, clearly there's the difference, people in the one group are slightly more likely to have a higher score or evaluate a movie a little bit better than the other group but you can also see that there's quite a lot of overlap. One individual from one group is very likely even though they're from this group that on average has a higher evaluation for a movie to like the movie less than someone from the other group. If we look at the slightly bigger effects size Cohen's d: 0.5 we can see the difference is bigger. There still quite some overlap. And Cohen's d is 0.8 is considered a large meaningful effect. This is really a big effect size and not many effects in psychology for example are as big as this. And even in this situation there's still quite a lot of overlap. Sometimes you might be tempted to think that if there is a significant result, significant effect. Then everybody in one group should score higher than everybody in the other group. But this distributions clearly show that this is not the case. Even when we have a big effect, groups overlap quite a bit. Now using this benchmarks, 'large', 'medium', or 'small' to interpret the size of your effect Should be only be used as a last resort. They're useful if you don't have anything else to go on but, it's typically already better to compare the effect size that you found to other related effects in the literature. You might be able to make a statement such as well compare to this type of intervention. This intervention has a smaller or bigger effect size. And this is better than just using these benchmarks. Sometimes this is not possible and these benchmarks are all you have to go on. So interpreting them in the minimum case should require these benchmarks but hopefully you can interpret the effect size and slightly more meaningful way by relating them to effects in the literature. So conceptually Cohen's d is the difference divided by the standard deviation but there are a lot of small variations of Cohen's d depending on the design that you use, for example within subject design or between subject design. There's the Cohen's d for the population if you would know the true value and there's the Cohen's d for the sample if you calculate the effect size based on the sup group. And all these small variations might become a little bit complex but they're important to keep in mind. They differ a little bit and if you want to take a look at this you can look at an article I wrote about this. I don't want to advertise my own work but people apparently seem to find this specific article useful. And in this article, I give you all the different formulas for these slightly different versions of effect sizes. It's important to keep in mind which effect size you should use, especially when you perform power calculations. So this is a screen shot of g power which you can use to perform a power analysis. And in this case, we're performing a power analysis for a dependent test. There are two groups, but they are dependent. In this case, G Power asks you to fill in Cohen's dz. So this is one specific way to calculate the standardized mean difference. If you use a different way to calculate the standardized effect size you might get an incorrect sample size estimate based on the power analysis. So keep this in mind. We have a difference between unbiased and biased effect size estimates. It turns out that if you just calculate Cohen's d. It slightly over estimates the true effect size. Especially when you have small sample sizes, this can be a little bit of a problem. Since you're most likely to use computer software to calculate the effects size anyway. It makes sense to always calculate the unbiased version of the d family effect sizes. In this case, it's called Hedges' g. You can see that the g is just Cohen's d multiplied by a small correction. And this correction is only dependent up on the sample size. The larger the sample there's all else no correction. But if you have a very small sample the difference can be enough to be meaningful. You can also calculate effect sizes from the population literature. It's not always the case that researchers write down what the effect size is, that they have calculated this effect size and reported it in the results section. If this is not the case, that's fine. In many cases there's enough information in the results section to calculate the effect size yourself. For example, Cohen's d can be directly calculated from the t value in the t-test and the sample size. Cohen's d for a within and a between design differ. You might have the same mean difference, but observations in a within design are correlated. This implies that if the correlation is really high, the effect size that's calculated based on the observation can be higher in a within design than in a between design, depending on the strength of the correlation. The difference in the Cohen's d in a within design and a between design is the square root of 2(1- r). This implies that whenever the correlation between dependent measures is larger than 0.5, this difference vector is higher than 1, which means the d in a within design would be higher. This also means, that the power in such a design is higher. In many psychological task, correlations between dependent measures are very high. So, very strong correlation which means that very often within designs will be more powerful than between designs. We already mentioned that it's important to report and interpret effect sizes. You can interpret effect sizes in relation to small, medium, and large benchmarks. But it's always better to try to interpret these effect sizes in relation to effects in the literature. Whenever you use these effects or empower analysis for example when you report them in your results make sure that you realize their small differences within the d family, depending on the design that you've used. [MUSIC]