[MUSIC] So what does that probability look like? Well, this figure's a little bit misleading because the X-axis is on the log scale. But, if you just look at the heights, what's going on here, is on the Y axis is the probability of drawing a particular digit. The blue line is associated with the digit one. The green line is digit two and so on. This was generated by a simulation, where I really did select random numbers from a distribution. And then I really didn't order them in the manner described in the previous slide. And put one in and then measured the probability of drawing it, a value with the number one, or where the vertices is one. And so if you think about this, the numbers, what's happened here, this is where one and this is ten and this is 100 and so on. Well, as soon as I put a one in, the chance of drawing a card with the one on the front is 100%. When I put a two in it's now 50%. When I put a three in it's now 33% and so on. But then as soon as I get to ten, it goes back up to a higher percentage. Again. And it stays there, ten, 11, 12, 13, 14 and so on. And then as soon as it gets to 20, it drops down a little bit. And so that's why it's climbing here, is through the tens. And then it starts dropping again. Again, sharply. Okay. And so the point here is that under this model, values where the first integer is one always are in the hat already by the time you get to the twos. And the twos are always there before you get to the threes. And the threes are always there before you get to the fours and so on. And so you end up with these height, these peaks are led by the ones. And the area under this curve, although remember this is a log plot so the area's no quite right, represents the probability of drawing this. So the probability does actually get higher. Okay. And there's a few different other models you can find if you read up on this, and I encourage you to. There's other ways to sort of think about the probability here. And there is actually closed form expression for Benford's Law as well. One of the limitations here, it's not always true, and one of the key, or the key situation in which it's applicable, is the dataset has to span many, many orders of magnitude. All right, it can't be valued between, I mean, think about it. If you have values between 50 and 90, and you select those randomly, well, you're not gonna get any numbers with the is one. And similarly if you do from one to 100, well, you have this effect a little bit but not universally. So you wanna span a lot of orders of magnitude. And so this is sort of illustrated by this plot here is that the red areas are, represent the probability of selecting something where the first digit is one. And the blue areas are where the first digit is eight, okay? And so as long as you span enough orders of magnitude, you get much more area where the first digit is one, okay? But if you take a narrow case, well, and then the probability is more defined by the distribution itself. And there's not enough chance for this area under the curve under the digits one, two, to get big. So fine, I just want to introduce you to that law. Mention that it can be used to detect fraud and then connect the fraud, plus change the mistakes back to this original context we're in of trying to understand the weakness of statistical results. [MUSIC]