[SOUND]. Let's first introduce some basic concepts. Frequent patterns and associations rules. We, first look at this simple transaction example. There are five transactions, 10 to 50 other transaction IDs. And these are the sets of items. They bought. For example, transaction ID 10 contains beer, nuts, and diaper, which form an itemset because this is a, a set of items. And for this particular one, it is three itemset because it contains three. Items. And for each item set, you may have a concept of support. Support means, in these transactions data set, how many times does beer happen? Like in this particular case, there are three occurrences of beer in this transaction data. So the support can't appear as three. But you also can use relative support, that means the fraction of the transactions. For example, there are a total of five transactions. Three of them contains beers, so the relative support is 3 over 5, or you can say 60%. So, we may see whether in itemset X is frequent or not. If X, the support of X, pass a minimum support threshold. For example, if we said the minimum support threshold is 50%. Then, we can see the frequent 1-itemset, in this data set, you will find there are 4, like, beer, you can see there are, 3 cases, the absolute support is 3, the relative support is 3 over 5 is 60%. Okay. But for frequent 2-itemsets, you may check it. There, there's only one, okay? Because there's beer and diaper, they happen together 3 times out of 5. That's why we get this. But none of the other, you know, itemsets, they pass this. 50% threshold. So there's only 1. And from the frequent itemsets we can introduce an interesting rule, Association Rule. That implies for example X implies Y simply says, if people buying X, what is the support and confidence people will buy the itemset Y, okay. Then s is the support which is the probability X and Y contain together in this rule set, okay. Then c is the confidence which is a conditional probability. That means, if the transaction contains X, what is the probability it also contains Y? So, for this probability computation, you can use support X union Y, divided by support X. That means, take the whole rule support, divide by the left-hand side. You may see this notion X unit Y. This is a set unit. But if you look at the Venn diagram. Actually, connections containing X could be this part. Connection containing diaper not beer is this part. Diaper is this part. Containing both actually is their intersection. Okay, the intersection of the events, okay. But for from itemset point of view. Echo the transactions containing both X and Y. Both beer and diaper, you will say this one work on. That's why we use not X intersects Y, but X union Y. Okay, if you think that X is beer, Y is diaper. X intersects Y will be empty. X union Y means it contains both. Then, for association rule mining, is actually try to find all such rules which has minimum support and confidence threshold. We already know if we set minimum support at 0.5, we'll find these are the frequent 1-itemsets, these are the frequent 2-itemsets. From here, if we set minimum confidence as 50%, we are going to derive two association rules, because these two rules if you use this computation. Beer and Diaper and getting together is 3. In here, okay. Then, it's 60% because 3 over 5. And if every time beer occurs, you can see diaper also occur, that's why the confidence is 100%. But for diaper implies beer, you probably can see the diaper support is 4, beer support is 3. So, that means there's only 75% of probability the customer buying a diaper, likely to buy beer. Are their more rules in this one? Actually if you check this, because this is the only frequent two item set. These two are the only association rules that can be generated from this transaction data. [MUSIC]