Now that we have quickly reviewed how to write a TensorFlow Estimator API model, let's put that into practice. In the previous lab, you created a sample or small dataset so that you can develop your TensorFlow model without having to read all of the data. In this lab, you will develop the model and try it out locally on that small dataset. Once you have the model working, in the next lab, you will train the model on the full dataset. In this lab, you'll use the high-level Estimator API to create a TensorFlow model. But which model, a linear regressor, a DNN regressor, you could do this? In fact, I encourage you to use one of the basic Estimator models. But, in addition, perhaps you could also try a wide and deep model. A wide and deep model tends to work very well on structured data. To understand why, consider a typical structured data problem. Let's say we are building a model to predict customer satisfaction in an ice cream store. Our input features might include the price of the ice cream, which employee served the ice cream, how long the customer waited, et cetera. Inputs like price and wait time, they are dense features. They're continuous and you can imagine that if the price is $3 instead of $2.5, then customer satisfaction goes down. As price increases or wait time increases, customer satisfaction goes down. On the other hand, consider the employee ID. It is not that a customer who served by employee 72345 is nine times more likely to be satisfied than if they were served by employee 8345. The employee ID is not a numeric feature. If our ice cream store has 10 employees, then we normally one hot encoded and we end up with nine or 10 columns for the employee ID. So, employee ID is an example of a sparse feature. In problems that involve structured data like the ice cream store example and like the baby weight example, we tend to have some sparse features and some dense features. Deep neural nets (DNNs), tend to work very well when your inputs are dense and highly-correlated. Images are canonical examples of such inputs. Neural networks are adding and subtracting machines. So they work well when you feed dense values that you can add and subtract easily to get fine representations of the input space. Nearby pixels tend to be highly-correlated. So, by putting them through a neural network, we have the possibility that inputs that get decorrelated and map to a lower dimension. Intuitively, this is what happens when your input layer takes each pixel value and the number of hidden nodes is much less than the number of input nodes. But this is what a sparse matrix looks like, very, very wide with lots and lots of features. So, it looks like a sea of zeros. Adding and subtracting these, you still have a bunch of zeros. Columns here also tend to be independent. They are not correlated with each other. So, deep neural nets don't work all that well. If your data are sparse, then linear models work a whole lot better. By using linear models, you can minimize the number of free parameters and if the columns are independent, linear models [inaudible]. In the real world, on structured datasets, you will have both types of features, both dense inputs for which DNNs are better and sparse inputs for which linear models are better. So, which one should you use? Should you use a DNN because you have dense inputs or a linear regressor or classifier because you have sparse inputs? Well, you don't have to choose. A wide and deep model lets you handle both. The idea is that you take your sparse inputs and connect them directly to the output, the way you would do if you are doing a linear regressor, and then you take your dense inputs, and you pass them through multiple layers the way you would if you were building a DNN regressor. The combined model is called a wide and deep estimator. The wide and deep model helps you get the best of both works. Linear models help memorize the input space and are appropriate when you want to essentially train separate linearly independent models for different values of a categorical variable. Deep learning models help to decorrelate the inputs and generalized better by capturing the relationship between dense inputs and the label. By using a wide and deep model, you get to trade-off relevance and diversity by treating some of your inputs as wide and others as deep. To create a wide and deep model, simply use a DNN linear combined classifier or linear combined regressor. A DNN classifier would just take one list of columns, but a wide and deep takes two lists. One list is of the wide features, the linear features. The other list is of the dense features, the DNN feature columns. Then also specify the number of nodes in each layer of the DNN part. Now, go ahead and work on the labs to create a TensorFlow model, and my colleague, Chris, will walk you through the solutions.