All right. Now for part three of our SVM notebook, we're going to wrap what we just discussed inside a function by passing in as well this estimator option in our function, so that we can pass in different SVC models with different gammas and different Cs. We're going to loop through each one of these hyper-parameters, the gammas and the Cs, so that we can start to actually visualize what the effect of regularization is when we increase or decrease our gammas and when we increase or decrease our values for C. So all we're going to do is take this function, this lines of code that we just went through, in order to create this decision boundary, and we're going to pass in as part of our function, our estimator. So rather than having linear SVC here initiated, we're going to pass in our estimator and we're going to call it estimator.fit within our function. So we're running it each time within our function. We're then going through all the different steps we did before in regards to defining X color, the corresponding Y color, making sure that they're mapped to red and yellow, running that meshgrid in regards to coming up with the X and Y values, predicting across every single one of those Xs and Ys, then outputting eventually that plot that we want to see what the decision boundary. So we're going to create that function called plot_decision_boundary. Then we're going to set gamma is equal to a list of four different gammas that we're going to run here. For each one of those gammas, we're going to first initiate an SVC model called SVC Gaussian. The kernel that we're going to use is going to be that radial basis function that we discussed during lecture, and the gamma value is going to be first 0.5, then 1, then 2, then 10. For each one, we're going to use that plot_decision_boundary function that we just defined above and plot out each one of the decision boundaries. Before I do this, I want you to think which one will have a more complex model versus a less complex model? Which one will have more regularization? More regularization means less complex. Which one will have less regularization? Meaning a more complex model. So we run this and see it start plotting out our different gammas in just a second. Similarly, we're going to run through each one of our Cs in just a second. So with the Cs, we're going to have, again, increasing from 0.1, to 1, to 10, and we're going to call SVC. We're going to say gamma equal to a constant, here being two, then we're going to loop through each one of these different C values. Again, for that I want you to start thinking which ones are going to have higher versus lower values in regards to the amount of regularization. So remember, more regularization for, I want you to think for higher value of C versus lower value of C, as well as think about the complexity of the model once you reduce or increase the regularization term. So we see here that our gamma is equal to, for the first one, sorry, going back up here, should have been 0.5, and we see that here in our model. If you'll look at line two here of SVC, we see that gamma is equal to 0.5 towards the end of that line, and we see that there's pretty high regularization. We have a pretty non-complex model. Remember, high regularization means a not so complex model. We then increase gamma to one and we see a bit more curvature in our line. So it's a bit more of a complex model. Then when gamma is equal to two, again increasing gamma, so we're reducing regularization by increasing gamma, and therefore coming up with a more complex model because there's less regularization. We have even more curvature. Then when we say gamma equal to 10, we see this wavy decision boundary that's getting closer towards over-fitting. Perhaps, this is still a decent decision boundary. But as we keep increasing gamma, we see that we increase the complexity because we're reducing the penalty that we'd have for our regularization terms. Then the same holds for C. So it's the same inverse relationship where the lower value of C, so here we're starting off with C equals 0.1, lower value of C means that we're going to have higher regularization. Higher regularization means a less complex model. Then as we increase C, we see how it gets more complex, more curvature to our model. So that's how we're going to see how this decision boundary is actually created when we're creating our linear SVC or our non-linear SVC here. So now in this exercise in part four, we want to start talking about the timing of working with our datasets. We discuss the different options when we have either a lot of features or if we have a large dataset and whether we should use just the plain linear classifier or if we should use the SVC, which we just did, or if ultimately we have such a large dataset that we should instead use a kernel approximation. So we're going to run through each one of these, and we're going to time how long it takes for it to run when we're just using the plain linear SVC, not the linear SVC, but the SVC with an RBF kernel, as well as using a Nystroem approximation. First we are going to import Nystroem, import our SVC, and then this SGD classifier's going to be a way of even speeding up our linear classification. So not only are we approximating our mapping to higher dimensions, but we're also going to come up with a bit of an approximation to our linear classifier as well. You have y equal to data.color equals equals "red", that's going to be our true or false values, and then our x is just going to be all of our columns except for that red. So now we're using all of our columns for x rather than just using those two columns that we had defined for x before. We're then using kwargs, and we can parse that in as just a something that you pass him within Python. As long as you pass in a dictionary, you're saying for the kernel argument, I want to pass an RBF, and then the same holds for Nystroem. So you run this and now we have our Nystroem as well as our SVC, and we can call SVC.fit and see how long that takes. We have this time and functionality here running and it's taking some time. This should take some time and this is the point that we're trying to get across. It takes about 1.3 seconds with a confidence interval of plus or minus 252 milliseconds. Now, if instead we use Nystroem, and then transform our x into a higher-dimensional space, and we parse in our x transformed and our y into our SGD classifier that we defined earlier. We can see the timing here, and we're going to hold on just a second, it should be a bit faster. We see it's a 102 milliseconds plus or minus 21.3 milliseconds. It runs through a couple of times to get a confidence interval on how long it'll take. This may be a bit different map for you, but what we see here is that it's running about 10 times faster than our initial dataset with just SVC. Then what we're going to do here is expand that out by five times much, and the idea being when we have a very large dataset, something taking 10 times amount of time, it can end up being a huge difference. If you're thinking about something that may take a full day to run versus a full 10 days to run and then a full week versus 10 weeks to run, these things can happen once you come up with more complex models. So you want to ensure that you're using something like an approximation to make sure you're on the right track. So here we're just creating a dataset that's five times as large as the initial, and this will take some time to run, and then we can time it here as well. Now, I'm going to pause here and we'll come back and we'll see the times in just a second, and then we'll close out this video. Now, if we look at the results here, we see that for SVC on its own, it took 23 seconds to run whereas when we were looking at the Nystroem, so first transforming our data to higher dimensions and then running SGD as our linear classifier, it took less than a second to run. So you see this vast difference between the timing of just using SVC versus using the kernel approximation along with the linear classifier. If you're using larger datasets, we generally advise that you should be using a kernel approximation first. Now, just to make clear the steps that go into coming up with the kernel approximation, we see here that we are either just running our regular SVC or we're doing it in two steps, running Nystroem in order to map to higher dimensions and then SGD as our linear classifier. If we look down here at our transformed data, let's just run this. First let's look at just x and we see our original x has 12 columns. Then when we look at x transforms, and let's look at the shape here, you see that we now have 100 columns. That's because the defaults for those number of components, this is going to be when we actually initiate Nystroem, the defaults is going to be number of components equal to 100. So we're going to end up with 100 columns. Our datasets can be that much higher dimensional space, a 100 columns, and then we'd go from there. That closes out our section on working with support vector machines, and I look forward to seeing you back in lecture where we'll pick up with our next classification method, decision trees. All right, I'll see you there.