Now that we have a window datasets, we can start training neural networks with it. Let's start with a super simple one that's effectively a linear regression. We'll measure its accuracy, and then we'll work from there to improve that. Before we can do a training, we have to split our dataset into training and validation sets. Here's the code to do that at time step 1000. We can see that the training data is the subset of the series called x train up to the split time. Here's the code to do a simple linear regression. Let's look at it line by line. We'll start by setting up all the constants that we want to pass to the window dataset function. These include the window size on the data, the batch size that we want for training, and the size of the shuffled buffer as we've just discussed. Then we'll create our dataset. We'll do this by taking our series, and in the notebook that you'll go through later, you'll create the same synthetic series as you did in week one. You'll pass it your series along what your desired window size, batch size, and shuffled buffer size, and it will give you back a formatted datasets that you could use for training. I'm then going to create a single dense layer with its input shape being the window size. For linear regression, that's all you need. I'm using this approach. By passing the layer to a variable called L0, because later I'm want to print out its learned weights, and it's a lot easier for me to do that if I have a variable to refer to the layer for that. Then I simply define my model as a sequential containing the sole layer just like this. Now I'll compile and fit my model with this code. I'll use the mean squared error loss function by setting loss to MSE, and my optimizer will use Stochastic Gradient Descent. I'd use this methodology instead of the raw string, so I can set parameters on it to initialize it such as the learning rate or LR and the momentum. Experiment with different values here to see if you can get your model to converge more quickly or more accurately. Next you can fit your model by just passing it the dataset, which has already been preformatted with the x and y values. I'm going to run for a 100 epochs here. Ignoring the epoch but epoch output by setting verbose to zero. Once it's done training, you can actually inspect the different weights with this code. Remember earlier when we referred to the layer with a variable called L 0? Well, here's where that's useful. The output will look like this. If you inspect it closely, you will see that the first array has 20 values in it, and the secondary has only one value. This is because the network has learned a linear regression to fit the values as best as they can. So each of the values in the first array can be seen as the weights for the 20 values in x, and the value for the second array is the b value.