So in addition to helping you choose between two different ML models, should I use linear regression or a neural network, you can also use your validation dataset to help fine tune those hyperparameters of a single model. Which, if you recall, those hyperparameters are set before training. This tuning process is accomplished through successive training runs and then comparing those training runs against that independent validation dataset to check for overfitting. So here's how your validation set will actually be used during training. As you saw when we covered during optimization, training the models where we start to calculate random weights, calculate that derivative, look at the direction down the gradient descent loss curve, minimize your loss metric and then repeat. Periodically, you want to assess the performance of our model against data that it has not yet seen in training, which is where we use the validation dataset. So after a completed training once this happened, validate that model's results against your validation dataset to see if those hyperparameters are any good or if you can tune them a little bit more. If there's not a significance divergence between the loss metric from the training run and the loss metrics for the validation dataset run, then we could potentially go back and optimize our hyperparameters a little bit more. Now, once the loss metrics from our model had been sufficiently optimized and passed the validation dataset, remember when you start to see that divergence and you confirm that the model is not overfitting, that's when we know we need to stop and say our model is tuned, we're ready for production. Now you can use a loop similar to this one to also figure out what model parameters for your individual models like what we did for the hyperparameters that we set before training. For example, the layers of a network or the number of nodes that you should use. Essentially, you'll train with one configuration like six nodes in your neural network and then train against another one and then evaluate to see which one performs better on your validation dataset. You're going to end up choosing a model configuration that results in a lower loss in the validation dataset, not the model configuration that results in a lower loss on the training one. Now, later in this specialization, we're going to show you how Cloud ML Engine and can carry out a Bayesian short search through hyperparameter space so you don't have to do this kind of experimentation one hyperparameter at a time. Now, Cloud Machine Learning Engine help us do this sort of experimentation in a parallel fashion using a different optimized strategy. Now, once you're done your training, you need to tell your boss how well is your model doing. What dataset are you going to use for that final go or no-go evaluation? Can you just simply report the loss or the error on your validation dataset even if it's consistent with your training dataset? Actually you can't. Why not? Because you used your validation dataset to choose when you should stop the training. It's no longer independent. The model has seen it. So what do you have to do? Well, you actually have to split your data into three parts; Training, validation, and a brand new completely isolated silo called test or testing. Once you're model has been trained and validated, then you can run it once and only once against the independent test dataset, and that's the loss metric that you can report to your boss. It's the loss metric that then on your testing dataset decides whether or not you want to use this model in production. What happens if you fail on your testing dataset even though you pass validation? It means you can't retest the same ML model. You've got to either retrain a brand new machine learning model or go back to the drawing board and collect more data samples to provide new data for your ML model. While this is a good approach, there's one teeny-tiny problem. Nobody likes to waste data and it seems like the test data is essentially wasted. I'm only using it once, right? It's held out. Can't you use all your data in training and still get a reasonable indication of how well your model's going to perform? Well, the answer is you can. The compromise between these methods is to do a training validation split and do that many different times. Train and then compute the loss in the validation dataset, keeping in mind this validation set could consist of points that were not used in training the first time, then split the data again. Now you're training data might include some points that you use in your original validation on that first run, but you're completely doing multiple iterations, and then finally after a few rounds of this blending you average the validation loss metrics across the board. You'll get a standard deviation of the validation losses and be able to help you analyze that spread and go with the final number. This process is called bootstrapping or cross-validation. The upside is you get to use all data but you have to train lots and lots more times because you're creating more of the splits. Right? So at the end of the day, here's what you have to remember. If you have lots of data, use the approach of having a completely independent held-out test dataset. That's like go or no go decision. If you don't have that much data, use the cross-validation approach.