You've learned the basic ideas behind some common classification models. Now let's apply that knowledge to the taxi data. Commercial navigation apps provide a lot of information about your trip; distance, estimated travel time, and tolls you may encounter on route. Can the pickup and drop-off locations from the taxi data predict if the trip will have a toll? Let's use a binary classification model to find out. In this video, you will create classification models using the Classification Learner App, compare the performance of those models, and export your work to MATLAB for further analysis. The Classification Learner App uses the same workflow as the Regression Learner App. Once you've selected your data, the app enables you to iterate through the process of choosing, training, and assessing your model before exporting to MATLAB. To get started, you first need to import and prepare your data in MATLAB. Here, the January taxi data is loaded. The addWasTollPaid function as a variable to the table indicating if a toll was paid to each trip. This will serve as the response variable for the classification models. With the data preprocessing done, let's open the Classification Learner App. In the apps tab of the ToolStrip, choose the Classification Learner App. Just as you did with the Regression Learner App, start a new session by clicking "New Session" and selecting "From Workspace". Select "WasTollPaid" as the response variable. Classification response variables are most often logical or categorical variables since these naturally break the data set into different classes. In the Predictors pane, unselect all of the variables, and select the four features that will be used to build the predictive model, the latitude and longitude of the pickup location and a latitude and longitude of the drop-off location. As with the Regression Learner App, your typical workflow will involve using validation as part of your model training, but that topic will be covered in a future lesson. For now, select "No Validation". Click "Start Session" to finish loading the data set. The Classification Learner App mirrors the Regression Learner App. The ToolStrip along the top enables you to easily choose the model type, access model options, train the model, and visualize the results. The left pane shows the model history and the details of the current model. The center pane shows model visualizations. The default is a scatter plot of the data. Here, pickup locations are displayed and grouped according to the WasTollPaid variable. The blue dots indicate that no toll was paid. The main differences between the two apps are in the models and visualizations available. Open the model type pane to see the available classification models. Logistic regression is a good starting point for creating a binary classification model. It's quick to train and provides a simple expression to classify the data. Now let's train the model. Wow, the model only took a few seconds to train and it has over 95 percent accuracy. The group scatter plot now uses markers to indicate where the model correctly predicted that a toll was paid. Dots represent a correct prediction and Xs represent when the model was incorrect. More specifically, blue Xs indicate where there was no toll paid, but the model predicted a toll. Orange Xs indicate where a toll was paid, but the model didn't predict it. That's a bit confusing, but a confusion matrix can clear things up. This visualization shows how often a model gets each class right and which class is chosen when the model misclassifies a data point. The rows represent the value of WasTollPaid from the data set. The top row of the matrix shows trips that did not have a toll, and the bottom row shows trips that did have a toll. The columns represent the model predictions. The first column contains trips that the model predicted would not have a toll, and the second column contains trips that the model predicted would have a toll. The main diagonal elements are the trips where the model correctly predicted the response, either true negatives where there was no toll and the model predicted no toll or true positives where there was a toll and the model predicted a toll. The off-diagonal entries show where the model made an incorrect prediction. The top right corner shows the false positives. The model predicted a toll that wasn't there. The bottom corner shows the false negatives. Here, the model incorrectly predicted a toll-free trip. In this situation, a false positive could be a pleasant surprise. You expected a toll and didn't have to pay one, but a false negative means a surprise toll. Too many of those would be annoying. But what if instead of tolls you were testing for a disease? Here, a false positive could be scary and lead to extra testing, but a false negative could lead to serious illness going undiagnosed, delaying possible treatments. Your domain knowledge will guide you in deciding when false positives and false negatives are acceptable. Minimizing one or the other could be very important. In this model, only about 10 percent of the actual tolls are correctly predicted by the model. This is because of the imbalance in the class sizes. Out of the over 230,000 trips, fewer than 10,000 had a toll. Since only about four percent of the trips have tolls, a model that always predicts no toll would have 96 percent accuracy. It just wouldn't be very useful. Models like logistic regression and SVM often have trouble dealing with data sets with imbalanced classes like this. Let's try a K nearest neighbor or KNN model instead. The app has several options for nearest neighbor classifiers. The fine, medium and course models use Euclidean distance while cosine, cubic and weighted use more complex distance measures. Choose "Medium KNN" as a starting point. This model has tuning options available via the Advanced button. A medium KNN model uses the 10 nearest data points to make a prediction. A course model looks at the 100 closest points, and a fine model uses the single closest point to make its prediction. Let's train the model, 99 percent accuracy. Remember though, always predicting no toll was 96 percent accurate. The scatter plot has significantly fewer visible Xs. This model looks promising, but let's check the confusion Matrix. There are many more true positives with this model. It looks like the KNN model could be a good choice for this prediction. To determine how good, we'll need more performance measurements. The following lessons will cover these in detail. For now, let's save your work by exporting the models to MATLAB. Like the Regression Learner App, the Classification Learner App has the option to save the algorithm used to train the model or to export the train model directly into the MATLAB workspace. Click "Export" model, and choose "Export Model" from the list. Name the exported model KNN Model. The model is now available in the MATLAB workspace. You can access it in the same way you access the regression models. In this lesson, you'll learn how to use the Classification Learner App to select, train, and assess some simple classification models. You've found a promising candidate to predict if a trip will have a toll based on the pickup and drop-off locations. In the following lessons, you'll learn more about metrics used to quantify classification model performance.