[MUSIC] In the previous lecture, we distinguished between two families of effect sizes. Cohen's d family on the one hand and r family on the other side. In this lecture, we'll take a look at Cohen's d measures of effect sizes - standardized mean differences. Let's start with a practical example. Do movie ratings differ between websites? And if so, how much? Does it actually matter where you go on the internet to see how popular a movie is? Now, this is my favorite movie of all time, Fight Club. I would definitely rate it a ten, myself, but I can understand that taste differs between people, so there might be some variation. And there might be a slightly lower average score than ten. Let's take a look at two different websites to see how people have rated this movie. If we go to the Internet Movie Database, we see that the score is a remarkably low 8.9 - okay. But if you visit Rotten Tomatoes, we see that the score that people gave Fight Club is actually 7.3 out of 10. This is actually a big difference if you look at it like this, but we need to quantify this difference to be sure. Is this really meaningful or not? So let's calculate the standardized mean difference to find out. Now, we have the two mean values. On the one hand, the Internet Movie Database, where the evaluation for Fight Club is an 8.9, and on Rotten Tomatoes the average evaluation is a 7.3. Now conceptually, the standardized mean difference is the difference in means divided by the standard deviation. So here we have two means, and we can subtract the one from the other to get the mean difference, but we also need to know the standard deviation. Now luckily, I happen to know that if you look at all movie evaluations that are being done, the standard deviation is, more or less, 1.4 on this scale. If you have a 10-point scale, then the standard deviation you can expect if people evaluate movies is 1.4. So we can have the mean difference and divide it by the standard deviation to calculate the effect size. In this case, the Cohen's d is a 1.14, which - as we'll see later - is a very big effect. So it really matters which website you go to if you want to know whether people like a movie. A Cohen's d ranges from 0, no effect, to infinity. When there's no difference between two groups, the mean difference is 0. And you can divide it by any standard deviation you want; the effect size will remain zero. If the difference is really really huge, then the effect size just goes up and up. Now let's visualize different effect sizes. If you don't have any idea of how to interpret an effect size, then there are useful benchmarks that you can use. And in this effect size, you can use the benchmarks of 0.2 for a small effect, 0.5 for a medium effect, and 0.8 for a large effect. In this graph, we see the visualization for an effect size of 0.2. It's clear there's a difference: people in the one group are slightly more likely to have a higher score or evaluate a movie a little bit better than the other group. But you can also see that there's quite a lot of overlap. One individual from one group is very likely - even though they're from this group that on average has a higher evaluation for a movie - to like the movie less than someone from the other group. If we look at the slightly bigger effect size, Cohen's d of 0.5, we can see the difference is bigger. There's still quite some overlap. And Cohen's d is 0.8 is considered a large meaningful effect. This is really a big effect size, and not many effects in psychology for example are as big as this. And even in this situation, there's still quite a lot of overlap. Sometimes you might be tempted to think that if there is a significant result, significant effect, then everybody in one group should score higher than everybody in the other group. But these distributions clearly show that this is not the case. Even when we have a big effect, groups overlap quite a bit. Now, using these benchmarks, 'large', 'medium', or 'small', to interpret the size of your effect should be only be used as a last resort. They're useful if you don't have anything else to go on, but it's typically already better to compare the effect size that you've found to other related effects in the literature. You might be able to make a statement such as, "Well compared to this type of intervention, this intervention has a smaller or bigger effect size. And this is better than just using these benchmarks." Sometimes this is not possible, and these benchmarks are all you have to go on. So interpreting them in the minimum case should require these benchmarks, but hopefully you can interpret the effect size and slightly more meaningful way by relating them to effects in the literature. So, conceptually Cohen's d is the difference divided by the standard deviation, but there are a lot of small variations of Cohen's d depending on the design that you use, for example a within-subjects design or a between-subjects design. There's the Cohen's d for the population if you know the true value, and there's the Cohen's d for the sample if you calculate the effect size based on the subgroup. And all these small variations might become a little bit complex, but they're important to keep in mind. They differ a little bit, and if you want to take a look at this, you can look at an article I wrote about this. I don't want to advertise my own work, but people apparently seem to find this specific article useful. And in this article, I give you all the different formulas for these slightly different versions of effect sizes. It's important to keep in mind which effect size you should use, especially when you perform power calculations. So, this is a screenshot of G*Power which you can use to perform a power analysis. And in this case, we're performing a power analysis for a dependent test. There are two groups, but they are dependent. In this case, G*Power asks you to fill in Cohen's dz. So this is one specific way to calculate the standardized mean difference. If you use a different way to calculate the standardized effect size, you might get an incorrect sample size estimate based on the power analysis. So keep this in mind. We have a difference between unbiased and biased effect size estimates. It turns out that if you just calculate Cohen's d, it slightly overestimates the true effect size. Especially when you have small sample sizes, this can be a little bit of a problem. Since you're most likely to use computer software to calculate the effects size anyway, it makes sense to always calculate the unbiased version of the d family effect sizes. In this case, it's called Hedges' g. You can see that the g is just Cohen's d multiplied by a small correction. And this correction is only dependent on the sample size. The larger the sample - there's no correction. But if you have a very small sample, the difference can be enough to be meaningful. You can also calculate effect sizes from the published literature. It's not always the case that researchers write down what the effect size is, that they have calculated this effect size and reported it in the results section. If this is not the case, that's fine. In many cases there's enough information in the results section to calculate the effect size yourself. For example, a Cohen's d can be directly calculated from the t-value in the t-test and the sample size. Cohen's d for a within- and a between-design differ. You might have the same mean difference, but observations in a within- design are correlated. This implies that if the correlation is really high, the effect size that's calculated based on the observation can be higher in a within- design than in a between-design, depending on the strength of the correlation. The difference in the Cohen's d in a within-design and a between-design is the square root of 2 * (1 - r). This implies that whenever the correlation between dependent measures is larger than 0.5, this difference vector is higher than 1, which means the d in a within- design would be higher. This also means that the power in such a design is higher. In many psychological tasks, correlations between dependent measures are very high - so, very strong correlation - which means that very often within-designs will be more powerful than between-designs. We already mentioned that it's important to report and interpret effect sizes. You can interpret effect sizes in relation to small, medium, and large benchmarks. But it's always better to try to interpret these effect sizes in relation to effects in the literature. Whenever you use these effects - in power analysis for example - when you report them in your results, make sure that you realize there are small differences within the d family, depending on the design that you've used. [MUSIC]