Now let's pick up where we left off with question number 3. Here we're going to split the data into train and test data sets. This can be done using any method. But we're going to want to consider using the SKLearn StratifiedShuffleSplit here in order to ensure that we maintain the same ratio of our predictor class, in both our train and test set. In regardless of methods used to split the data, you should afterwards compare the ratio of classes in both our train and test splits. So here we start by importing from model selection our StratifiedShuffleSplit. We're then going to create our StratifiedShuffleSplit object, similar to how we've done with different SKLearn objects, and we're going to pass in the different arguments in regards to this specific StratifiedShuffleSplit. We only want to create one split, we're just going to split into one train and one test set. If you wanted to do more we can increase the number of splits. We want our test size to be 0.3. So 30 percent will be our holdout set. The other 70 percent is going to be what we train on. To ensure that you, as well as what I'm showing you here, will be the same results. We're using a random state here being equal to 42. Generally, doesn't matter as long as they are the same across both users. We then are going to take that StratifiedShuffleSplit that we just defined, and call this split, that split method, and then pass in both our X values, so our features, as well as our Y values, the value we're trying to predict, data.Activity. When we create this objects, and I'll pull this out as I've done before. We are actually creating a generator objects so we can call next. So if we look at just this object here, when I run this, we see it outputs a generator object. Why call next on this object, it's just going to output, and we'll see this in just a second, both of the train and test indices. So we'll end up with train indices and tests indices with the train index being first and the tests index being second. Then we can just use.loc in order to specify that we want, given our original dataset, we want each one of the rows that were specified in our split, as well as every single one of the different columns, of our feature columns, that is. Then for our Y train, so we have our X train. For our Y train, we just use that same index except now we're specifying that we want the activity column and not all of our feature columns. Then we do the same thing for each one of our tests indices. So with that, we've defined our X train, our Y train, our X tests, and our Y test. We can then check what our value counts are for Y train, one we call normalize equals true within our value counts, is they give us our proportion rather than the actual counts, and that will allow us to compare both our Y train and our Y test, and we see that they're fairly similar for each one of the difference activities, which are labeled by integers that we want to predict. Now moving on to question 4. We're going to fit a logistic regression model without any regularization to start, using all of our features, remember we have 561 feature, so it's quite a large feature set. We mentioned here that we want to ensure that we read the documentation about fitting a multi-class model. Because the way that the coefficients are going to spit out are going to be a little bit different, and I'll talk to that as we look at the coefficients themselves as well. Then we're going to use cross-validation to determine the hyperparameters that will fit models using L1 and L2 regularization. So just as we've done in the past and as we'll continue to do in the future, we'll find the right hyperparameters in order to ensure that our out-of-sample prediction is optimized in terms of its error. Then we're going to store each one of these models. Again, we want to ensure that we understand the multi-class models, the solvers, and the regularizations that all come into play, and we'll discuss this as we go through each one of the models. So the first thing that we do is fit a regular logistic regression model, which is just going to have no extra hyperparameters, no regularization. We see here that we pass as the solver liblinear, and the only thing you need to know, there's different solvers that optimized for different versions, and you can look at the documentation. What we need to know here is that we're using here the one versus the rest that we talked about during lecture, which is just trying to predict a certain class versus all the rest, and then another class versus all the rest. So we're going to do that six different times for each one of the six different classes, and liblinear is just one of the ways to optimize when we're using one versus rest. So we're specifying that for our logistic regression, and then we're fitting that model that we're going to be initiating and fitting in one step using our X train and our Y train. So you run that, that may take just a second, but not quite as long as what we're about to do here. I'm going to start talking about this next step. You see that the step before just finished running. Now we're going to import the LogisticRegressionCV, which is that cross-validation method we talked about in a bit in lecture, and it's similar to GridSearchCV, which we've learned about earlier, that'll allow us to loop through many different hyperparameters. What we're going to do is use that LogisticRegressionCV, the hyperparameters that we generally want to look through are the different types of penalties, we're just going to specify L1 here and the different C values, and C is similar to Lambda except that it's the inverse here. So we're going to say Cs equal 10, which is just going to be the default, and we can look at the documentation here. Let me just run. Oh, we didn't import it yet, but once we import it, this is going to take a second. But what we allow here when we do the default of Cs which is 10, is that's going to check it across 10 different default values of that C value, and then we can optimize from there. So I'm going to run this, and again, we're using the solver equals liblinear. Our penalty here is equal to the L1 penalty. I do want you to notice that this will take a long time. I'm going to run pause and then we'll come back to it. It may take about 7-8 minutes depending on the strength of your computer and the CPU power that you have available. But this is just the highlight how long certain models can end up taking, and how you want to ensure that you optimize those values. So I'm going to run this now and we'll check back in as soon as it's done running. So now we're back and hopefully for you that's just a cut to where we are now. I have gone ahead and fit both the logistic regression with an L1 penalty, as well as going ahead and running it for the L2 penalty, that one should be a little bit faster, as we've discussed before. Generally speaking, the L2 penalty tends to run a bit faster just because of how it works under the hood, and we have also optimized using this cross-validation method, to ensure that we are also choosing the appropriate Cs with our four different holdout sets. We have cv equal to 4 so having four different holdout sets, to see which values of C will optimize best on some holdout set. So now we want to compare the magnitude of each of the coefficients for each of the models that we came up with, and we're using the one versus rest fitting, in terms of how we are using multi-label classification. What that's going to mean is that when we come up with our coefficients, if we think about the models that we're trying to create, we are trying to create models that will predict a certain class versus the rest of the classes, so standing versus not standing, sitting versus not sitting, so on and so forth. So each one of these coefficients will have to show up for each one of these one versus rest models, and they will define, for those log odds, for each coefficient, how much are we increasing the odds of standing versus not sanding, sitting versus not sitting, and so on, so that each one these coefficients will be specific to each one of our different labels. So the first thing that we're going to do is initiate an empty list. We're then going to create these coeff_labels and coeff_models, we're going to use that to loop through as we see right here, and we're going to zip those two together, and when we zip those two together, we can access both the label and the model at each step. So the lr as a string, and then the actual lr model that we created earlier, are going to be the first thing that we reach within a for loop. We're then going to pull out our coefficients. So just from our model, our mod being, again, just signifying what we have here in the for loop, we're going to pull out those coefficients. We're then going to create a multi-level index. So that multi-level index will be on top of our DataFrame, and we'll see what that looks like once I print this out, but that's going to have our different levels. So again, this is a multi-level index where the first level is going to be the name of the actual model, and then we're going to also highlight these numbers, 0, 1, 2, 3, 4, 5, which is just going to say the coefficients for each one of these versus the rest. Again, we have replaced each one of our labels with integers, and now those integers will be used to say, zero versus the rest, so that could be sitting versus the rest, standing versus the rest, and so on. Then in regards to codes, all that means is how do we want these ordered? So it's going to be 00 together, 01 together, that first one being all zeros, because we only have one label, and then that's tied to each one of these different numbers, and we'll see this with the labeling as well. We're then going to take our coefficient list and append onto that list a DataFrame of our coefficients that we have here, we're going to transpose them so that they're a column, and then the columns names are going to be that coeff_label, which is our multi-index label that we just created. Then once we run through that for loop, we are then going to concatenate each one of these different coefficient lists, we're going to concatenate the entire list together across each one of the columns. So I'm going to run this and again walk through what we just discussed. This is just a sample, we sure to have a value, each one of these numbers is going to be a different coefficient. Each one of these should add up, there should be about 561, I believe was the number. We can look at the coefficients.shape and we see that's 561 rows for each one of the different coefficients. Then each one of those coefficients are going to come with a number, some numerical value, to define for that coefficient, for zero, which could be sitting is going to decrease that odds versus the rest of the different values, and then increase for one, increase for two, and so on. So we see the effect of each one of these coefficients, for each one of these one versus rest models. Now we're going to plot all of these models, and again, the code is a bit complex, so we're going to walk through step-by-step so that we understand what we're actually doing here. We're going to create our subplots using plt.subplots, we're going to initiate the figure, and then also this axList. That axList should be three rows by two columns, so there should be six different bounding boxes within which we can plot. We're then going to flatten that out, just so it's easier to run a for-loop through that axList. So we're going to end up with the first one, the second one, the third one, each one of those being a bounding box. So we'll have six bounding boxes, and then we just set the size here using that figure objects to make sure that we have the size that we want for each one of these, for the entire figure. We're then going to enumerate the axList. Now the axList currently is an object that is going to be the six bounding boxes in which all of our different plots are going to be contained. Our loc, because we're enumerating axList, that axList is going to come with a numerical value, that's going to be the loc. So it will be 0, 1, 2, 3, 4, 5, 6, with the related ax, with that related bounding box. We're then going to take the coefficients, which is what we had up here. This is just a sample of it, but it's going to be that 561 row by 18 column data frame, and we're going to access certain values. You see here we have that multiple index where we have linear regression as at top index, or logistic regression, excuse me, and then the l1 penalty and the l2 penalty. Then below that, we have each one of the different models, one versus the rest. We're going to say that we want to locate. Here we're using the loc value that we used for enumerate. So the first value, that's going to be zero. So we're going to be locating each one of these different zeros. So we have a zero here, and a zero here for l1, and a zero here for l2. We're accessing that in level one. So lr, l1, and l2 are level zero. We're accessing within level one, and we're saying that the axis is equal to 1. So we can have multiple indices for our rows, but we have multiple indices for our columns. So axis equals 1's means that we're using axis to access our multilevel column index. We're then going to take that data, which is now specific to at first just these zeros, so lr zero, l1 zero, and l2 zero, and we're going to plot those just using a scatter plot. So we're going to have a marker equal to o, no lines between. We're setting our marker, so each one of those dots will be a size of two. Where do we want to plot that? In this axis that we've defined here, in the bounding box as we loop through them, and we don't want a legend. Then we're saying here, if the axis, axList 0, if it is our first value, then we will actually add on this legend at the location equal to 4. Then we just set our title plus that string, and we'll run this. What we see here is for the coefficient set 0, so this is zero versus the rest. We have each one of our different coefficients, and there should be 561 of them. We see that we have some high values here on the low end, and maybe some here on the low end for coefficient set 0. For coefficient set 3, we see it's scattered throughout. So here's just getting the strength of each one of those coefficients. We're going to stop this video here, and in the next video, now that we've seen the coefficients across 561 coefficients and saw how that worked for multiple label indices, we're going to get started with actually setting out predicting as well as coming up with the probabilities for each one of the predicted classes. All right, I'll see you there.