Okay. So, we've just been asked a whole bunch of questions about how users reorder items. Let's just start out by reminding ourselves how many users we have. So, I'm counting directly from the user's table here, because I know I can spell. I had a nightmare the other day that I forgot how to spell select. Okay. So, we have just over 100,000 users, so that's our point of reference. The next thing I'm going to do is I'm going to ask how many users have ever ordered and it might seem like I want to start from the users table, but I actually want to start from the orders table because that has user ID and I'm just going to count the distinct user ID from the orders table. I should probably label this column as "users with orders." Okay. So, we have just under 20K users who have ever placed an order. That's not bad. Now, let's think about how to check if someone has ever reordered an item. So, we're going to be wanting to look at a UserID level from the orders table and also the item level and we want to count how many distinct, and I don't think I really need a distinct in here, but it doesn't take that long to run, so I'm putting it in. Count distinct line item id and that's going to be the times my user ordered. Of course I have to group by user_id and item_id. So, here I'm just looking at all of the users orders grouped by item and I'm counting how many times that item appeared. What I'm going to use this for is if a user has ordered something multiple times, I'm going to assume that they were reordered it on a separate, if they ordered it on a separate line item. Okay. So, let's just check and make sure that query works. No, what did I do? I need an:AS". There we go. Great. So, now you can see the user, the item, how many times they ordered it. I'm going to wrap this in a subquery. Let me call this table "user level orders" and then from that table I'm going to look at the count of distinct users, user id. We want to count how many people reordered and in order to do that I need to add another square clause down here where times the user ordered is greater than one. So, let's just see what we've got, 211. That's really low. Let me just double-check and make sure. It would be something I would totally do to mess up this inequality. Let's just check and make sure there's some times people ordered two, that looks right. Okay. So, 211 users have ever reordered an item. We have over a 100,000 users almost 20,000 users who have ordered something ever. I don't know how to tell you how to feel about numbers, but this number is pretty small. For most businesses this is pretty small. So, let's step back a little bit and just look at how many people have ever ordered more than once. So, here we're going to look at something similar. We're going to look at user ID again and we're just going to count how many invoices they have. Remember an invoice id can have multiple line items on it. So, "invoice id as order account" and I'm going to group by user ID and just see what that looks like. You know what I forgot is I forgot to put in this distinct because we do have multiple copies of the invoice id. Okay. So, lots of people have just ordered once. Same trick as before, count how many people have ordered multiple times. Okay. So, I'm learning from my mistakes in the last tab and running this the select star on it first. Maybe a little bit of formatting and here I want to count the distinct user id and I want everything to be distinct. Okay. So, there's more people who've ordered more than once, but it's still not great. So, I'm a little bit lost here. I'm trying to help someone figure out what to build next but I'm going to keep identifying behaviors that aren't super common. So, I'm going to just get an idea of how orders are distributed across items to give myself a little bit more context. So, next tab, I just want to know are there any items that people order really often, would it make any sense to do like a top ordered items list. So, here I'm going to look at item id and then I'm just going to count the line item id as times ordered, I got to group by item id. Let's see if that works. Okay. So, I'm looking at all these items and it looks pretty well distributed because there's an item. So you know it's been only 25 times. Okay, so I didn't ask for the whole dataset but right now I'm not seeing a whole lot of best sellers, basically nothing is as standout hit. Let me look and just see if there's anything that shows up when you do this by category, so I'm just going to copy this. I think I have item category in kind of the same thing. For each of these categories we've got a pretty well distributed number of times they'd been ordered. Of course this is not really a secret to me because I generated this dataset kind of random plate like this. But this would tell you that there isn't really a bestseller category. There's such a difference between apparatus sales and device sales that you'd want to make apparatus specific recommendation system. So, I'm still struggling here to make a good recommendation for what to do. So, I'm going to try another thing, which is; do users order multiple things from the same category and the reason that I'm asking this is because when I looked at the items in a specific category you might get a contraption and then a contraption storage unit. So I might make sense that people are ordering multiple things in a single order. So, here I'm going to start with a query from earlier, where we counted the number of items ordered except here I'm going to count it by item_category, and they sit here. Okay, so this is already interesting. I'm looking at this and I see that this user has ordered a tool once but this user's order contraption has four items Let me just throw a distinct in there in case I've missed something. So there's some people showing up here who've ordered multiple things from the tool category or multiple things from the contraption category and that's promising. That's maybe a more popular behavior than we've seen before. So let's actually just compute an average, I'm going to call this user_level and then I want to just take an average of the times that the category was ordered. Okay, we'll see if it's at all different by category. That comma and then I have to group by item_category. So, whenever people order instrument, they tend to order multiple items from that category. That's a pretty good insight. That could tell you that while people are in the middle of ordering instruments, you want to recommend other instruments to them and maybe they're not going to order it a second time on a different date but they might add it to their order and when we're thinking about how many users have ordered at all, remember we looked at that users' orders. That's a much bigger audience than the users with re-orders. We could make a much more meaningful product change if we targeted just everyone coming in who had an order. Okay, and just because we talked about it a little bit in the concept video I wanted to show you how to compute the time between orders using windowing functions. I'm not going to type the whole thing but you can start out with this part of the query. We're going to take the orders table and we're going to look at the orders, and order them by paid at date. So you can see users 17 has this one invoice. That's all in order number one and see if you can find someone with multiple orders, not really, just remember it's not that common. So that we'll be identifying just the first orders and then in this again it's basically the same thing except look at what we're doing here. We're going to restrict this first-orders table to have order number one and the second orders table to have order number two. So when we run the whole thing without highlighting a specific line; we should get a user_id, the date of their first-order, the date of their second-order, and the number of days between and to compute that, I'm just subtracting the dates but this is something that definitely changes depending on what kind of SQL you're using. In Hive it is datediff instead of subtracting dates in the line. So, this is how you would do it if you wanted to figure out how long between reorders of an item. Actually, it was just the time between orders not reorders of a specific item and you can also tell that there's some duplicates here, right? We're getting one line probably for every item in the order or something like that. So there's a little bit more work to do if you wanted to turn this into something meaningful to tell you when to target someone to reorder an item. But I don't think that it's really worth doing because that wasn't a very common behavior and so we can thankfully avoid doing this kind of complicated query because we did a lot of more simple queries in the beginning, that helped to guide us to a good solution and I think a good solution. To sum up our good solution is we take an item that you have already put in your cart and we recommend other items from that same category because that's a behavior that's very likely most people order in multiple items from the same category when they order and that's a pretty big chunk of people. Users with orders is pretty big. It's as big of a targetable audiences you're going to have for something related to ordering.