Now this concept of mapping up to higher dimensions and then finding that linear separation is great conceptually, but in real life, it may not scale very well. Support vector machines with RBF kernels are very slow to train when we have a lot of data. These we'll have to be applying the kernel for every single data point many times over and that can take a really long time. In practice, one does not need to be very rigorous to achieve good enough results, so we can construct an approximate kernel mapping and that's usually going to be our good enough. Our goal is to buy a lot of computational ease and time in exchange for a minimal drop-off in performance. There are a few methods to approximate the kernel. The idea being to use a kernel map to actually create a dataset in higher dimensional space using methods such as Nystroem and RBF sampler that we see here. These will just perform that mapping, not that classification step. These will map our original dataset into higher dimensions. Once we do that, we can then fit a linear classifier such as LinearSVC, even logistic regression, or the SGD classifier, which if your data is too big, will be a quicker way of computing what the optimal value is, but maybe again, a little bit of a drop-off in performance. So how do we do this? What's the code to actually perform the Nystroem or the RBF sampler to come up with an approximate for this higher dimensional space? The first thing that we're going to want to do, as we always do, is import that class. So we're going to import Nystroem from sklearn.kernel_approximation. We're then going to create an instance of the class. We initiate that class calling Nystroem as well as the different hyperparameters. Different kernels can be used as multiple non-linear kernels such as RBF. We talked about polynomial, and you can look at the documentation for any more you may be interested in. The kernel and Gamma that we have here are going to be identical to those that we saw for the SVC model. Then we have this argument n_components and that's just going to be the number of samples that are used in order to come up with our kernel approximation. As we discussed, generally speaking, that kernel approximation will come from every single one of our different data points, instead, we say, just sample 100 of those data points. That's going to be the n_components. The smaller that is, the less components the sampling from, the more you use, the longer it will take, but the closer it will be to an approximate of the whole dataset. We're then going to fit the instance on the data as we've done before, but this time, we're doing a fit_transform. We're only passing in our one dataset or X, and we're calling fit_transform on the X_train because that X_train needs to be fit and transformed using the fit from X_train and we are going to be mapping it up to that higher-dimensional space. Then, for the X_test, we have to assume that we don't have that holdout set, so we can only use the fit that was learned from our X_train. As we discussed with the standard scaler, this will be the same idea where we use the values that we learned from X_train in order to transform our X_test, so we just call the transform using that fit from X_train in order to transform our X_test. Then we can again tune our kernel and the associated parameters using cross-validation. Here we'd probably want to in practice create a pipeline as we discussed in earlier courses so that we can later use something like GridSearchCV to run through different hyperparameters for our transformer here. We'd run through all the hyperparameters for the transformer, and within that pipeline, we'd also have some type of linear classifier and we can see for the holdout set which one performs best. Another kernel approximation method is the RBFsampler. So again, from sklearn.kernel_approximation, we import the RBFsampler this time. We're then going to again create an instance of our class, initiate in that class of RBFsampler calling it rbfSample, and for the RBFsampler, RBF is going to be the only kernel that can be used. This will be specific to RBF, and the parameter names are again going to be identical to what we just discussed in regards to the Gamma and the number of components. We're then going to fit that instance on the data and transform. So again this is the same exact method that we just discussed, or you run fit_transform on our X_train and then just transform on our X_test. Again, we can tune the kernel parameters and components using cross-validation. Again, probably wanting to do a pipeline where we have both this RBFsampler as well as some type of linear classifier. Now I'd like to talk through when we would use that kernel approximation versus the regular SVC with kernel, or just use a linear separation. The way that we'll break it down is by features, data, and then we'll say which model choice given the size of our features and the size of our dataset. So if we have a lot of features, we have over around 10,000 features and we have a very small dataset, then we're probably best just using simple logistic or LinearSVC. The reason being because we already have a very complex feature set and we already have a ton of dimensions more so than our data, so mapping it to higher dimensions will just add more complications to our model when they are probably not necessary to come up with that linear classification. Next, we have few features, so less than 100 features, let's say, and a medium amount of data, so you have around 10,000, then you can use the SVC with the radial basis function. So here with a low number of features and a decent-sized dataset is a good scenario to augment the dataset. You can map it here to a higher dimensional space without running into the technical issues such as slowness or no convergence. So we can use here the SVC with sophisticated kernels. Then finally, if we have very few features, so less than 100 features but a huge dataset, then we may want to either add on features and then do logistic regression using our polynomials or whatever it is, or we can do LinearSVC once we run a kernel approximation. So those will be those kernel approximations we just discuss, such as the Nystroem or the RBF sampler, and we can derive new features without having to create new features for every single one of our data points because again now we're working with a ton more data, we come up with an approximation that will map to higher dimensional space and then come up with our linear classifier. So this should be the breakdown to keep in mind when deciding how to use either your LinearSVC, your SVC with RBF, or some type of kernel approximation. Let's recap what we learned here in this section. In this section, we learned how to use kernels with support vector machines and the concept of how they will essentially be mapping our original data into higher dimensions in order to find a higher dimension that's going to be ultimately linearly separable. We discussed with that non-linear decision boundaries, with the kernel trick giving us the functionality to be able to come up with non-linear decision boundaries within our dataset. We talked about different implementation techniques for support-vector machine kernel modeling, including approximation methods such as Nystroem and the RBF sampler when working with a dataset that perhaps has too many rows to use a regular SVC. With that, we'll move over to our Jupyter notebook to get a glimpse as to how support vector machines will work in action. I'll see you there.