Hi. Welcome back to an Introduction to Human Behavioral Genetics. This is the second module of lecture two, and in lecture two, we're talking about, initially, we're talking about the twin study method, and in this second module, we'll be talking about really the statistics of assessing similarity of two members of a twin pair. Last time we talked about twins as a natural experiment, and the basic logic of a twin study just to refresh your memory, is that if these monozygotic twins are more similar to these dizygotic twins, then we're going to draw the conclusion that genetics is an important contributor to the individual differences on whatever trait we're measuring, that phenotype. Alternatively, if the two types of twins are approximately the same in their similarity, then we're going to conclude that it's their rearing environment that's the predominant source in shaping the individual differences in the trait or phenotype we're measuring. Of course, in order to complete this natural experiment, we need a method for assessing the similarity of the two members of the twin pair. That's what we're going to talk about today. We can distinguish, for the purposes of this course, we can distinguish two broad types of traits or variables or what I'll call phenotypes. The first type is categorical, either or traits. Things like whether or not you have a diagnosis of schizophrenia, whether or not individuals completed a college degree, or whether or not a marriage ever ended in divorce. We can think of these as either you have it or you don't have it. On traits like this, on phenotypes like these categorical phenotypes, the measure of twin similarity that twin researchers typically use is called concordance, twin concordance. Alternatively, a second major type of phenotype or trait we will be talking about in this course is a quantitative trait, things like IQ or extraversion, or, how much you drink in a week. These are traits that are distributed along a continuum, so they're numeric. The measure of similarity that we use for twins in this case is called a correlation coefficient. So, let's begin with categorical phenotypes. Twin similarity for a categorical phenotype is assessed by what we call concordance. Concordance, the definition of concordance, is the probability a twin has that categorical phenotype, that condition, given her or his cotwin has the condition. It's actually quite simple to compute concordance, and I'm going to use some data that I've collected on a sample of Minnesota twins here over the last 10 or so years. In this case, the, the, the, the phenotype or categorical trait that I'm going to talk about is whether or not an individual met a diagnostic, met diagnostic standards for having drug abuse or dependence on one of eight or nine different substances. It turns out that in my sample, having a diagnosis like this is relatively common. I had a total of almost 4,000 individuals in the sample and of that, 774 met criteria for this diagnosis. So, 20% of the individuals, now we're not talking about concordance yet, 20% of the individuals had this condition. The sample actually also consists of twins, so I can also measure their similarity for having a diagnosis of drug abuse or dependence. In the case of the monozygotic twins, I had a total of 447 twin pairs, where one member of the twin pair had drug abuse or dependence. If I look at his or her cotwins, of the 447 cotwins, 278 also had a diagnosis, whereas 169 did not. The concordance rate then, is just 278, the ones with the diagnosis, divided by the total number of cotwins I observed, in this case, 62%. A similar statistical calculation in the dizygotic twin, code twins yields a calculation of 53% concordance. So in this case, 20% of the individuals in my sample had a diagnosis of drug abuse or dependence. The concordance in monozygotic twins is substantially higher than that 20%, in which case we can say that monozygotic twins are similar on this phenotype. Their concordance is roughly 60%, threefold larger, so there's monozygotic twins similarity for drug abuse dependence diagnosis. The dizygotic twins were concurrent about 50% of the time. They were also more similar than just a single individual or two random individuals. So there's also dizygotic twins similarity. In this case, monozygotic twins are somewhat more similar than dizygotic twins. Perhaps there are genetic influences on this trait, something that we will come back to a little bit later in this module, in this, in a later module. So a couple basic characteristics of concordance, because we're going to see concordances, as we go through the course, and particularly when we, we look at schizophrenia. First of all, the definition again. What concordance is, is a, a statistic. It's a probability that a twin has the condition given his or her cotwin has the condition. The interpretation is it estimates the risk of that con, of having that condition in the cotwin. In the previous example, the risk of having drug abuse dependence and a, a monozygotic cotwin of an individual with drug abuse dependence is 60%, threefold higher than the population rate of 20%. The range, concordance can range, theoretically, anywhere from 0%, there's no risk to the cotwin, to 100%, the cotwin always has the same condition. In practice it's rare to have the concordance rate for twins being lower than the population prevalence of the disease, or the disorder. In this case, it would have been pretty exceptional and, and unexpected to observe the twin concordance rate less than 20%, because that would mean if I observed a risk less than 20%, it would have meant that having a cotwin with a disorder actually reduced my risk of having this, the disorder. Theoretically possible, in practice very rare to see. So that's concordance. Now let's move on to the second type of metric for measuring twin similarity, correlation. For a quantitative trait or a quantitative phenotype, twin similarity is typically measured in terms of a correlation coefficient. Now it's a little bit more difficult to define correlation coefficient because really to define it, you need a statistical formula. But for our purposes, we can call it an index of strength of linear relationship between two quantitative scores. [SOUND] So let's go through an example. Again, using data that I've gathered on a large twin sample here in Minnesota, what I've displayed here is what's called a scatter diagram. On the horizontal axis, I have one twin's height, and on the vertical axis, his or her cotwin's height. Actually in this case, I've only taken the males because I want to factor out the, the sexual dimorphism in height here, so these are actually only male twins. And I'm just plotting here then, the two twin's heights. This happens to be about 850 pairs of monozygotic twins, and one thing you'll notice right away is that the points here scatter very tightly along the line. And that scattering very tightly along the line actually is a reflection of there being a very strong correlation between the two twin's heights. Why is that? Well, let's see. Overall, the range of twin's height, in the second twin's height here, ranged anywhere from 160 to 200 centimeters, so if I didn't know anything about the twin other than it's some twin in my sample, then I could say that, well his height is probably going to be somewhere between 160 and 200 centimeters, right? All the twins' heights are in that range, but what if I know his co-twin's height? Let's pick one code twin. Let's say the code twin had a height of 190 centimeters. Then if I look at the co-twins of a twin who is 190 centimeters tall, look at how tightly they cluster. They don't span the whole range of 160 to 200 centimeters any more. They cluster very tightly along that line, and in fact the range here is only 188 to 192 centimeters. So once I know the co-twin's height is 190 centimeters, then I know that the, that that, the co-twin of that twin with a height of 190 centimeters, is going to be somewhere in a very tight range. Not 160 to 200 centimeters, but 188 to 192. That reflects that there's some strong similarities between the two twin's heights, the difference of this range versus that range. What we would like is a summary metric or statistic that reflects how strong that relationship is. The correlation coefficient, actually developed by Gultan and his students is a measure that reflects the strength of that relationship between the two scores. In this case, I've calculated the correlation between the two twins' heights. For the monogyzotic twins, it's a correlation that's actually quite high, it's 0.92. As you'll see in a little bit, the correlation can never be greater than 1, in this, in which case all the line, all the points would fall right along the line. So, this is a very strong correlation. It's helpful maybe in this regard to compare it to what the dizygotic relationship for height looks like. Here I have about 500 pairs of dizygotic twins. These are all male-male dizygotic twins. Here you see that their points scatter much more widely across the line. There's a weaker correlation. And when I compute the correlation, it's only .56. Monozygotic twins have much more similar in heights than dizygotic twins, .92 versus .56. Of course, we're not so much interested in height although it's kind of an interesting trait. We're going to be more interested in behavioral traits. So what might we begin to expect if we looked at behavioral traits? Well, again, using data I've collected here on samples of Minnesota twins, here's a trait we'll come back to later, but I just want to show you the basic data. This is the scatter plot now, for the IQ's of a little bit more than 700 monozygotic twins. Now, we have both male and female monozygotic twins, and what you see is that they're not as similar as they are for height, but they're pretty similar. The scatter plot is pretty tight and the correlation is .82. Dizygotic twins are similar in their IQs as well, but they're not anywhere near as similar as the monozygotic twins. Genetic factors, something we'll come back to later, appear to influence, they certainly don't determine your IQ, but may influence to some degree individual differences in IQ. So, some basic characteristics of correlation. First of all the definition I gave you just a little while ago, it's an index of the strength of linear relationship between two quantitative scores, its range. Theoretically, it can vary from minus 1, which would be a perfect negative association, all the points fall on a line, and the line is negatively oriented, to plus 1, a perfect positive correlation. All the points fall on a line, and the line has a positive orientation. High scores go along with high scores. In practice, the twin correlations that we would see usually vary between zero, there being no relationship, to positive 1, there being a perfect relationship. And, in fact, it would be rare to see a twin correlation of perfect 1, of, of a perfect 1. So usually, they're less than 1 and for most psychological traits, quite a bit lower than 1. Finally, something that we'll come back to later in the course, actually in particular when we get into quantitative genetics, but also when we get into molecular genetics, one interpretation of a correlation coefficient that will be very handy for this course. If you square the correlation coefficient, you get a percentage, and it's the percentage of variance accounted for in one variable by another. So, if the correlation, just to round it up and make the mathematics easy for me here, the correlation between two monozygotic twins' height is 0.9. If I square that, I get 81%, that's r squared, and my interpretation is that I can account for 81% of the individual differences in height of one twin by knowing that individual's co-twin's height. So that concludes our discussion of metrics for twin similarity, the concordance, coefficient, as well as the correlation coefficient. Next time we'll begin to look at findings from twin studies.