0:05

All right, so we've talked a little bit about binary choice

modelling using Excel as our software package.

We're going to still use that choice modelling approach as our foundation.

But let's look at how we would apply that to timing data,

or to build duration or timing models for use.

So a couple of different places that duration data's going to show

up in marketing.

A lot of it shows up in services research.

So looking at how long until a prospect takes action, looking at the impact of

customer satisfaction, the impact of service encounters on customer retention.

Anything looking at customer retention really relies on these duration models.

In some work, on some behavioral research, we've looked at response latency.

How long, after you're exposed to a stimuli,

does it take you to respond accordingly.

When we talk about new product forecasting,

we're going to look at a dataset a little bit later on.

That says, based on the behavior of individuals,

based on how long it took them to acquire the product,

can we forecast how much of the population ultimately going to try this product?

And ultimately customer-based analyses,

we're trying to build models that allow us to get a customer evaluation, identify

which of your customers has lapsed, which of your customers are still active.

All of these are built on duration data, built on those interactivity times.

And we're going to use that choice model framework to build discreet timing

models today.

1:42

All right, so let me give you an example of where this comes into play.

So this is based on some published research where

it looks at a cohort of customers who began service

at a telecommunications provider at the same time.

And we're trying to forecast out how many of those customers are left.

And so if we look seven months out, looks like we're around 50%.

Well, what does this curve look like in month 8, 9, 10, 11, and 12?

Can we forecast that out?

Now, one simple approach would be to say, all right, well,

let's run linear regression.

Linear regression, as we're going to see in a second,

is going to have a problem to it.

Because linear regression says, I'm going to fit a line to months zero

through seven, and the trajectory of that line is going to continue.

There are a number of different functional forms that we can try.

2:47

but what happens in month eight?

What happens in month nine?

Well, it turns out that in a lot of those cases,

those forecasting models don't do a very good job.

We have some models that even though we're dealing with survival curve projections,

start to go up again because of the functional form that was chosen.

We have others like the linear model, that keep on going down at that same rate.

And so, none of these do a particularly good job of being able to

forecast out what customer retention looks like in the future.

Even though all of them have good R square values,

none of them did a good job at forecasting the future performance or

the future decisions of this cohort of customers.

So can we build something based on a simple model, and when we start putting

all of the pieces together, allows us to get very good forecasts?

Well, that's going to be the goal.

All right,

a couple of things to keep in mind when we're dealing with timing models.

And this does not come up in any of the other forms of data that we

had talked about.

Well, for timing models,

we only observe actions taken during a specific period of time.

So for example,

let's say that we're looking at customer retention for a 12 month period.

Well, we observe all the customers who dropped service

during that 12 month period.

We also observe a set of customers at the end of 12 months who still have service.

Well, that issue is referred to as right-censoring, all right?

That I observe data during this particular window, 0 to T.

What happens after T?

I have no idea.

Left-censoring is a different problem,

we've only observed beginning at a particular point in time.

We observe everything that happens after that.

Well, we don't get to observe what happened before that.

So suppose that we're looking at a queue.

People lined up at a customer service window, and

we have some people who are in that line and we know what time we got there.

We have no idea what time the people who were there before us got there.

So we have a minimum guess for how long they've been there but

we don't know the exact time, that's the issue of left-censoring.

Interval-censoring, we know that something happened in a particular interval of time.

Let's say within a particular hour or within a particular 15 minute chunk,

but we don't have it down to the exact second.

That's going to be more common for us to have to deal with,

just in terms of the nature of the data that's coming in.

If you're dealing with clickstream data, that's something that we might have.

We might intentionally group observations together into

more coarse units to simplify our analysis.

And in fact, the examples that we're going to be looking at,

that's what we're going to do, is we're going to assume

discrete intervals of time rather than continuous time.

But general timing models, you can account for all of these forms of censoring.

5:52

So let's begin by building as basic a model as we can.

And that is let's assume that in a given month,

customers have a probability of theta of cancelling service.

Well, the flip side of that,

which means that they're going to keep service with a probability of 1- theta.

All right, so each month, customer makes a decision.

But for month t, for that customer to make the decision, that means that he

had to survive, he had to decide to keep service for the first t minus 1 months.

All right, so what's the probability that a customer drops service in month t?

All right, well for month t, there's a probability,

theta, but what about months 1 through t- 1?

Well, that customer had to keep service in all of those months,

so we're going to multiply it by probability (1- theta) for

keeping service, raised to the power of t- 1.

So that gives us the probability that a customer drops service in month t.

7:02

And if we were to look at the other possibility for the data that we're going

to observe, either we observe the month in which customers do drop service,

and that's our likelihood function for those data points.

The other possibility is that customers are right-censored.

They keep service until the end of our observation period.

So if the length of my observation period is t,

some customers are going to keep service for that entire length of time, all right?

So those are the ones who still have service at the end of our

observation period.

This is what's referred to as a shifted geometric distribution,

the shifted part because there is no t equals 0 in this distribution.

So we can estimate this model, actually using any statistical software package.

We're going to do this using Excel, using the Solver tool that's built in.

But for every data point we have,

we can specify the likelihood associated with that datapoint.

So for the customers who drop service in month 1, in month 2,

in month 3, all the way through the t minus 1 month of our data period,

and even the t-th month of our observation period.

This is the formula that describes their likelihood.

For that set of customers who hold onto service and

didn't drop it by the end of our observation period,

this is the likelihood that we're going to use for those customers.

8:48

So between 0 and 1, how many customers did we lose?

Well, we're going to have a difference of 131 customers,

who got rid of service after one month.

All right, what about between months 1 and 2?

We go from 869 to 743, well, we've got 126.

Customers who dropped at the second op,

at the second possible time that they could have.

Then we keep on going down, we're going to be looking at these differences.

The next group, we've got 90 customer or 90 of our remaining

subscribers who cancelled service at that third option, all right?

And then we've got at the end of 7 months,

here's our 491 customers who still have service after month 7.

All right, so for all of the ones who dropped service,

we know the likelihood associated with them dropping at a particular time.

For this 491 customers, there's a different likelihood function

associated with them because they survived all 7 months.

All right, so we're going to head over to Excel,

use that to estimate what is that probability.