All right, what we're going to be doing in this exercise is building a model for retention. And so you'll see a couple of different tabs in this file. We've got a working tab that we're going to be using to build our model. We have another tab for this geometric model that we're building that's complete, that I've annotated already. And then the other model that we're going to use or that we're going to create is one that allows for a time trend. So I'll explain that difference between these two models as we get a little bit into things. The final tab has a comparison between the performance of these different models. So, one of the basic building blocks is the geometric distribution for customer analytics. The idea in the geometric distribution is, think of it like a coin flip, each time period, there is a probability that the event happens. So, in our case the even is churn or retention. So in Month 1, there's a probability that you churn. In Month 2, if you haven't churned yet, there's a probability that you churn. In Month 3, if you haven't churned yet, there's a probability that you're going to churn. The assumption with the geometric distribution is that it's always the same probability, so it makes for a very simple model but it's also one that's a little bit restrictive. So we're going to start by estimating this model using Excel, estimating what is the retention or churn probability. We'll see how that model performs. Comparing actual to expected data. And then we'll see if that's not enough, do we have to do a little bit better? What can we do as far as allowing that churn probability or retention probability to vary from one time period to the next, so making the model a little bit more realistic. The data that we have, we've got 1,000 individuals that began at time zero. And we know the total number of individuals who continue to have service in each time period out of this cohort. What we've calculated in Column C and, let me zoom in so that what we're doing is a little bit clearer. What we've done in Column C already is taken the difference between the number of individuals who had service in the previous period minus the number of individuals who have service in the current period and that tells us how many individuals cancelled service or how many individuals churned. And so we're going to model the probability for each of those individuals who have churned of doing so after one month, after two months, after three months, after four months, and so forth. So that's what we're going to do in column D is calculate out that probability. We're then going to calculate the log likelihood for each set of individuals that have. We're going to calculate the log likelihood for each set of individuals based on how long they have maintained service before cancelling. Well, we'll take into account the individuals who have not yet cancelled service and calculate our sample log likelihood. And we're going to use that to say let's find the value for P. In this case, we'll set this up, again, we can do this either as a retention or a churn probability, doesn't make a difference, but let's say I want to set this up at as a churn probability. What is the value that maximizes the observed data? All right, so let's begin by specifying what's the probability that an individual cancels service after one month. All right, well, if we're going to specify in terms of a churn probability, It's going to be simply equal to the value that's in K1, I'll put dollar signs around that. Now what if we have people who survived one month and then churned? All right, well let's write out these probabilities first and maybe we can see a pattern that emerges. All right, so it's going to be one minus our churn probability to say I survived one month and then churned. All right. Okay, what about the next one which I survived two time periods and then churn. Well, it's going to be one minus our churn probability. We're going to raise that to the second power because it's say I survived two months, and I churned in the following month. And if we compare these two formulas, so when it's, I survived one month, that's one minus the churn probability times the churn probability. So I survived one month and then I churned. In the next cell it was, I survived two months. So that's why we have it squared and then I churn. Well, in the next one is going to be I survived three months and then I churn. So all that's going to be changing in the next cell is going to be instead of raising the one minus churn probability to the power of 2, it's actually going to be raised to the power of 3. And that's the pattern that we're going to see time and time again. So let's actually, go back to sell D3, and rewrite this a little bit. And the way that we can rewrite this in a more general fashion is, there's the churn probability multiplied by 1 minus the churn probability, so that's going to be the retention probability. And we're going to raise that, so Shift 6. So what power are we raising it to? We're going to raise it to the power of t- 1. So what does that look like if we drag? So I survived t -1 periods. So in this row, I survive zero and then I churn. In row four, it's I survived, If we drag this formula down, All right, so in row 4, it's I survive A4- 1. So I survive one period, and then I churn. In the next row, it's I survive A5 so 3- 1. So I survive two periods and then I churn. So the 1- K1 is my retention probability raised to the power for how many periods am I retained, and then it's my churn probability. All right, so that's the geometric model. Now in terms of specifying the log likelihood, well, we have 131 individuals who exhibited this behavior. So, to construct the contribution to the log likelihood, it's going to be the number of individuals who exhibited this behavior, multiplied by the natural log of the probability. And we're just going to copy that formula. Now the reason we're using natural log is that using the raw probabilities we're going to end up with a number so small because it would be 0.5 for each of these 131 individuals, that'd actually 0.5 raised to 131st power. That's an incredibly small number. The more individuals we have, this joint probability becomes so close to zero that the computer can't actually differentiate it. So that's why we use a mathematical trick, and instead of maximizing the likelihood we maximize the log likelihood. Now, what we've done in cells E3 through E9, we've taken into account all the people we observed to churn. What we also need to take into account is the number of people who have not yet churned. All right, and so if we look at all the individuals who have been observed to churn, that's 509, if we just look at that summation, we still have 491 individuals who were not observed to have churned. So these are individuals who survived seven time periods without churning so let's take that likelihood into account now. It's going to be our 491 individuals in B9 multiplied by, now what's the probability associated with not having churned by this particular time period. Well a couple ways that we could do this we could say it's the retension probability. 1- churn probability raised to the seventh power. We could also write it as, it's the probabilities that we haven't enumerated yet so it's one minus all of the probabilities that you have churned. And so that's going to be the log likelihood contribution of those 491 Individuals. The other way as I've mentioned. If you want to look at it slightly differently. It's 1 minus the churn probability. Raised to the seventh power. So it doesn't matter which way you specify, we end up with the same result. All right, and so what's the log likelihood of our sample? Well, we're just going to take the sum of the log likelihood contributions from each individual in column E. All right so it's a big number that doesn't make a whole lot of sense to us. But that's the number that we want to make as large as possible, so we're going to maximize that value. And to do that, let's see, we're going to go under, looks like we have not added our analysis tool pack in yet, and Solver, so let's make sure that we have those tools enabled. So, Options> Add-ins, and we're going to manage our Excel add-ins, and we're just going to make sure that we've put Solver on here, all right. So, we're going to click on solver, and see what objectives do we have? And this has already been run previously, but our objective is cell K2. And we want to maximize that by changing the value that's in cell K1. Well. And the constraints that we've put in is that P has to be a probability. That's it. Let me click on Solve. Let's try that one more time. So cell K2 is our objective function by changing cell K1 subject to our constraints. And we're done, right? So our churn probability is about 10%, which means we've got a retention probability of about 9%. And so column D what it now gives us is we can calculate out, out of the original 1,000 people. Under this model we would've expected to see about 10% churn in that first month 9% churn the next month 8% after that 7% after that and so forth. All right, now given that we've got 1,000 individuals. We can see in some cases, just for these seven periods, we're not even talking about forecasting yet. But let's just calculate some expectations here. It's 1,000 individuals, that's the size of our cohort, multiplied by. The observed probabilities that we had, and so the probabilities that we calculated for churn in a particular time period. And let's take a look at how well [INAUDIBLE] we underpredict churn in the first couple of periods and then it looks like we're overpredicting the amount of churn. So even in this calibration period, we're not doing too well, all right? Let's look at this visually just to see what's going on. This is one of the problems you could see with the geometric distribution. Looks that I've highlighted the wrong column in one of these, so let me try that again. What I want to highlight are, how many actually dropped service in a particular month versus how many were expected to. And let me make the legends a little bit meaningful here. Yeah, we'll call. This the actual, and we'll call the other series Expected, and this is, so under this particular model notice what we're doing is we're, we under predict churn early on, over predict it later on. There's no way that this model is going to work well from a forecasting standpoint, because it's not even doing that well or in the calibration period we're essentially splitting the difference. We were trying to find kind of that middle ground so we over predict in some periods, under predict in other periods. That's because this is a very restrictive model, right. Now, if you're looking for a little bit more commentary on kind of each of the steps that we took, you'll find it on the next page in the spreadsheet, where I've annotated kind of each of the steps that we took along the way, all right. So let's try to relax some of these assumptions. And that is, rather than assuming there's a constant churn probability in each time period, let's assume that there's a time trend, that maybe churn becomes. Less likely overtime or more likely over time, but that it varies over time. All right? So we're going to use the logistic regression that we've looked at previously as the foundation for that and focus on this time trend worksheet. So same data that we had before It's set up a little bit differently. So I'm going to walk you through how we're going to proceed here, right? We're going to calculate out the survival probability first, and then we're going to calculate the difference between the survival probabilities in each of the adjacent time period. So if I was If I maintain service until time one, what's the probability of that happening? What's the probability of maintaining service until time two? If I take that different, that's going to tell me how likely I am to drop service after the first time period. So that's what we're going to calculate here, the like will then calculate the log likelihood for each observe term period. So and then arrogate that up to the sample log likely hood. Notice in this case we have two parameters I'm calling alpha and beta. We're going to treat alpha as an intercept. We're going to treat beta as a slope. We're going to essentially use logistic regression. So alpha is going to be the intercept, beta is going to be the slope for our time trend. Alright? And so let's start out by calculating the survival probability, all right? Well the survival probability in order to make it to the next time period you had to survive until the period before. So we're always going to refer back to the previous period to say it's the probability that I survived at least until through the end of the previous period and I'm going to multiply it by the probability of surviving in this current time period. All right? And so we're going to use, I'm going to write this a little bit differently from how we traditionally do logistic regression, we're going to model, we're going to use logistic regression to model tendency to churn. And so typically we're going to have e to the x divided by 1 + e to the x. In this case, if that's our term probability, retention probability's going to be 1 over 1 + e to the x. So what we're going to do is we're going to multiply the. The probability of surviving into the previous period by one divided by one plus the exponential function and it's going to be the alpha. So, our intercept plus our slope multiplied by a time trend. All right. So that expression coming from doing the logistic regression. I've got an intercept and I have a slope. Currently both of those are set to zero. So that's the reason that this is going down to point 5. Now what's the next period if I drag the formula down it goes to point 25. And let's take a look at the formula itself, it's D3 so I survive at least until the previous period. Through the end of the previous period and it's multiplied by the probability that I survive the current period. So that's the logic in this. It's I survived through the end of the previous period multiplied by the probability of surviving again. All right, now let's use those probabilities to calculate the probabilities that an individual churns in each particular time period. It's the survival based on the previous period minus the survival based on the current period and so we're going to use that logic of taking the difference in the survival function, to calculate out what's the probability of surviving until the present and churning in the present time period. All right? And so, we have those probabilities, this is the probability of churning by the end of the seventh period. Notice that this is just shy of one, because there's some chance that people do not churn by the end of that seventh time period. So, here is the probability in cell D9 that they survive through the entire observation period. So we'll use that. So in terms of constructing our log likelihood, same ideas before number of individuals multiplied by the natural log of the probability that we just calculated So we can calculate the likelihood contribution for the individuals who we observed to churn. And also, don't forget the people who we observed to maintain service, throughout the entire observation period, That is information that we want to take into account, when we're modeling churn. The fact that individuals have not yet churned is informative. It's a censored observation. Now if we scroll on over, in terms of the log likelihood, It's going to simply be, for the whole sample it's the sum from F3 through F10. And we're again going to go into solve because now we got to find the values for both alpha and beta. Okay, so we're going to maximize our objective, cell L3, the log likelihood. We want it to be as big as possible by changing cells L1 and L2. We have no constraints. Alpha and beta could be positive, they could be negative, all right? And so I'm going to check, we're going to make sure this box that says, make unconstrained variables non-negative, make sure that's not checked. I believe the default in some versions of Excel is for that box to be checked off, and don't want that to be the case. Okay, we click on solve, and we have our solution. All right, so alpha -1.55 again these numbers in a raw sense, don't really tell us all that much. Beta being negative, what does that tell us? Well let's look back to our survival probability equation. So as time increases, we'd set up this expression to be as time increases our coefficient for the time trend is negative. So this is a bigger negative number. Which means our expression in the exponential piece is getting smaller and smaller. So as this term gets more and more negative, the denominator, The exponential term gets closer and closer to 0, so this fraction gets bigger and bigger, right? And so churn Is decreasing over time is effectively what we're finding here. Recall back we'd said earlier that the way we set up this logistic regression expression was. If we think of the churn event as happening with the probability of e to the x over one plus e to the x, the retention event happens with the probability of one over one plus e to the x, so churn is decreasing in time. So if we want to see how this performs overall, let's set up our calculation for how many customers, we can do this in terms, in two ways. The way that we've set this up is for the number of remaining customers, and that's we can use our survival functions to do this. We take our 1,000. We've always got 1,000 customers. And let's just multiply that by the expected survival probabilities to forecast how many customers remain at a particular point in time. I'll also add one more column that we can take a look at. So we're going to be comparing customer in column B, customers in column G, see looks pretty good, all right. We can also look at the incremental expected churners. Okay, this is going to be 1,000 multiplied by that churn probability. All right and so let's do a comparison. I'm going to relabel this as Actual just so that we can see it. But if we look at our actual remaining versus the forecasted remaining, Looks like it does pretty well, those two lines essentially overlaying each other, and what if we do another comparison of the actual versus expected. It looks pretty good. I mean, compare this to what we saw with the geometric distribution and we're doing a heck of a lot better. So, what we've done is, we've allowed the churn probability to change over time. In particular, what we find is that churn is decreasing over time. All right, so it doesn't make sense for us to assume that there's a constant churn rate. You'll find on the next tab in the workbook, under the more heavily annotated version of this spreadsheet, you get the same kind of results. And what I've done on the last tab with just giving a comparison and I included data that goes a little bit further out into the forecasting period. All right, so going into the forecasting period, we see that again, we're doing a much better job than if we used this full data period than with the geometric model was capable of. So we started with as simple a model as possible, saying let's just assume it's a coin flip. Each time period, there's a constant probability of churn that was our geometric model. And we saw okay, it's not great, we saw that's essentially splitting the difference. But when we look at the overall forecast, we see that the geometric model just doesn't do that well for us. Yes, it captures the overall shape, but sometimes it systematically over-predicting or under-predicting churns in other time periods goes the other way, whereas allowing for the model with time varying churn rate, we'd going to end up with a very much better forecast. Now, one of the things we'll talk about a little bit later in the course, something that's kind of pushing the limits of Excel is what's driving this dynamic. Two potential explanations. One is that individuals are literally changing over time becoming more and more loyal, less and less likely to churn. The other explanation is that the individuals who remain later on are fundamentally different than the individuals who remain early on. So, one explanation is dynamics in the churn probability. The other explanation is heterogeneity across the user base. Ideal model is going to combine both of those factors. So we'll talk a little about unobserved heterogeneity, the importance of heterogeneity later in the course, but again, it's something that is really pushing the limit of what Excel can do, better suited to a computing environment, such as R and MATLAB. But in terms of, yeah, accuracy of the model, just bringing in a simple time trend here, a linear time trend, does much better than ignoring that time trend. And so we are able to pick up some of the dynamics that occur in that churn rate.