[SOUND] Welcome! This lecture discusses model fit and forecast elevation for logit models. In the previous lecture, we have seen how to estimate the parameters of a logit model. And how to use t-tests and likelihood ratio tests to select appropriate explanatory variables. After you have specified an appropriate logit model, it is of course of interest to analyse the fit, and forecast performance of the model. First we consider model fit. As the logit model cannot be written in a linear regression format, it is not immediately clear how to construct residuals. However, we can use a similar idea as for the linear regression model. Residuals of the logit model can be defined as the difference between y and its expectation. The expectation of Y equals 0, times the probability that y=0, plus 1 times the probability that y=1. Hence, the expectation is simply the probability that y=1. So, the residual is the difference between y_i and the probability that y_i=1. This residual can take values between -1 and 1. A value close to -1 corresponds to the case where the value of y is 0 while the expectation of y is 1. A value of 1 corresponds to the situation where the value of y is 1 and expectation of y is 0. A value close to 0 means perfect fit. In this case, the value of y is 1 and expectation equals 1 or the value of y and its expectation are both 0. Note that you can never have exactly the value of zero as the logit probability can never be exactly 0 or 1, but you can get very close to 0. You can use residuals to analyse the performance of your logit model. I now want you to think about the following. Suppose that we have almost perfect fit for all observations. What is in this case the numerical value of the likelihood function? For all observations equal to 1, the likelihood contribution is very close to 1. And for all observations equal to 0, the likelihood contribution will also be very close to 1, as the probability that y=0 will be very close to 1. Hence, the likelihood function is the product of roughly ones and obtains the value very close to 1. We can use this perfect-fit result to construct R-squared type measures of fit for the logit model. As parameter estimates are obtained by maximizing the likelihood function the R-squared measures are expressed in terms of the likelihood function instead of the sum of the squared residuals. Such R-squared measures are therefore called pseudo R-squared measures. The measures compare the maximum likelihood value of the logit model under consideration with the same value of a model with only an intercept. In case of perfect fit the value of the likelihood function equals about 1. Which implies that the value of the log-likelihood function equals 0. There are two popular pseudo R-squared measures. The McFadden R-squared, compares the value of the log-likelihood function of two models, both evaluated at a maximum likelihood estimate. The first model is a logit model with the explanatory variables included. The second model contains only an intercept. If the model with explanatory variables has about the same likelihood value as the model with only the intercept, the explanatory variables hardly contribute to the fit of the model. The R-squared is then close to 0 indicating that the fit is not good. In case of perfect fit, the likelihood function of the model with the explanatory variables is roughly 0 and hence the R-squared measure equals 1. In all other cases the R-squared measure is smaller than 1 but positive. In practice the value of this R-squared is usually rather low, as perfect fit is hardly realized. The Nagelkerke R-squared uses values of likelihood functions instead of the log-likelihood functions. From the formula on the slide, you can see that in case of perfect fit with L(b) equal to 1, the R-squared also equals 1. When the model with only the intercept provides the same maximum likelihood value as the model with explanatory variables, then the Nagelkerke R-squared equals 0. This R-squared is always between 0 and 1, and usually takes much higher values than the McFadden R-squared. The choice between the two R squared measures is mostly a matter of taste. Apart from determining which variables explain choice, the logit model can also be used to predict the value of y for new observations for given explanatory variables x. As you already have seen before, the expected value of y is equal to the probability that y=1. Hence, an unbiased prediction of y is equal to the probability that y=1. To evaluate this probability we only need to replace beta by the maximum likelihood estimate b. The constructed forecast is a probability and not equal to 1 or 0. To construct a 0/1 forecast we need to transform the forecasted probability into a value of either 1 or 0. A simple decision rule is to forecast the Outcome 1 if the predicted probability that y=1 is larger than the threshold c, and to make the forecast 0, if the forecast probability is smaller or equal to c. Many computer packages take the value of 1/2 for the cut-off value c, but you are free to choose the value of c. Another common choice is to take for c the fraction of ones in the sample. I now want you to think about the following. Does a higher value of the cut-off value c generate more, the same or less predictions equal to 1? A higher value of c means that less or the same number of forecasted probabilities are above c, and hence you forecast less or the same number of outcomes to be one for a higher value of c. If your data contains a high percentage of ones, the predicted probabilities are very likely to be close to 1. And hence a cut-off value of 1/2 sometimes results in no zero predictions. A common option in this case is to take for c the percentage of observations equal to 1 in the sample. This implies that you only forecast the outcome to be 1 if the predicted probability is larger than the percentage of ones in the data. To evaluate point forecast, you can use so-called prediction-realization tables. Such a table provides an overview of the forecast performance of the logit model. To construct this table, you need to know the true outcomes corresponding to the predictions. The table is based on the relative number of times your prediction does or does not match the true outcome. On the slide, you see the four counts you need to construct the table for the situation where you have m predictions. The prediction realization table displays the counts in a 2 times 2 matrix in an orderly way. The green cell contains, for example, the relative frequency that both the forecast and the actual outcome are 1. The orange shell shows the relative frequency that you observe a 1 but the forecast is 0. The sum of the two diagonal elements in the table is called the hit rate. This value indicates the relative number of correct predictions. To shed light on the forecasting performance of a logit model, we can compare the hit rate of the model with the hit rate based on random prediction. The sum of the two off-diagonal elements indicate the relative number of incorrect predictions, which of course equals 1 minus the hit rate. The last column and last row in the table show the sum of the two columns and rows. The bottom row indicates the fraction of zero and one predictions, and the last column the observed fraction of zeroes and ones. Now I invite you to make the training exercise to train yourself with the topics of the lecture. You can find this exercise on the website. And this concludes our lecture on analysing the fit and forecasting performance of logit models.