Notice anything different about this dataset? Click on the link and start training the model in the new window. What do you observe about the loss and the graph of loss over time? Do you see any convergence towards zero? Assuming you've clicked the start training button directly, you should see output like what is shown here. Note that the decision boundary does a poor job of dividing the data by class. Why might this be? The reason is that the data have a non-linear relationship, that is, you can't draw a straight line dividing orange from blue. What this data calls for is a non-linear decision boundary, which in this case we intuitively recognize to be a circle around the blue datapoints. However, all is not lost. By clicking on some of the boxes in the input column, see if you can introduce new features that will dramatically improve performance. Hopefully by now, your output looks like this because you've selected the X one squared and X two squared features. Note how circular the decision boundary now is. How is it possible that a linear model can learn a non-linear decision boundary? Recall that linear models learn a set of weights that they then multiply by their features to make predictions. When those features are first degree terms, like x and y, the result is a first degree polynomial, like two x or two thirds y. And then, the model's predictions look like a line or a hyperplane but there's no rule that says that the features in a linear model must be first degree terms, just as you can take X squared and multiply it by two, so too can you take a feature of any degree and learn and wait for it in a linear model. Let's see how far we can take this new idea. So, what about this curve? The last time we were able to find two non-linear features that made the problem linearly solvable. Will this strategy work here? Try it out. What you've now figured out is that using the feature options available to us and this type of model this particular dataset is not linearly solvable. The best model I was able to train had loss of about point six. However, the qualifier of feature options available to us is crucial because in fact, there is a feature that would make learning this relationship trivial. Imagine for example a feature that somehow unswirled the data, so that blue and orange appeared simply as two parallel lines. These parallel lines would then be easily separable with a third line. Moments when you find powerful features are magical but they're also very difficult to anticipate, which is problematic. However, even though we don't often find features that are as amazing as the ones we've seen in our toy examples, feature engineering or the systematic improvement of, or acquisition of new features is an extremely important part of machine learning, and it's what we'll focus on in course III. So, what can we do when our attempts to engineer new features for linear models fail? The answer is to use more complicated models. There are many types of models that are able to learn non-linear decision boundaries. In this course, we'll be focusing on neural networks. Neural networks are in fact no better than any other sort of model. The reason they've become more popular is because today's business problems are biased toward those where neural networks excel.