So, supervised learning. Generalized set of steps. So, you have some data set, you load your data set, you choose some algorithm that you want to use to attempt to learn something from this data, you want to train that algorithm, you want to train that model, you're going to go through a training step. Sometimes you want to visualize the model, you want to visualize the data, you want to test the model, you want to evaluate the model, and these two can be interchanged and mixed, but the idea is, after you're done training the model and you've looked at how it performs, you make some measurements on it against data that you know, and perform an error calculation to see how far off the predicted results are from the reference data that you know to be true. So, linear regression. In our example here, we want to predict housing prices, based on square footage of the house and the number of bedrooms. Now, this is a complete, I took new information from CS 229 and Anderwin uses this example he's drawn on the board, and I'll show you some graphs coming up here. There could be many more, what I'll call features, we'll see in a moment, square footage, number of bedrooms. There were just two that I picked because I wanted to keep it simple for me, because I'm not relying on a library, I wrote all the code for this learning algorithm because I wanted to see how it'll work, and I wanted a fairly simple problem. Space, I started out small, but you could also have other features like: distance to schools, distance to bank, distance to movie theaters, distance to the grocery store, crime rate, weather patterns. You could introduce many features, and all of these could come together and be looked at to predict housing prices, but I just picked two: square footage and the number of bedrooms. So, here's my made up data as just text copied right out of my source code. So, I use an NumPy, I hope you're familiar with that, there's a lot of great routines in the NumPy package. So, this is an array, I can't remember how many rows were; one, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16. Sixteen entries. So, this first column is square footage, second column is the number of bedrooms, and the third column is the price, and I just made them roughly ascend, but they could be in any order, they don't have to be sorted. I was in a hurry here as well, I needed some training data, so I just made up these numbers. So, there's the data. This is the known data. So, you can say, we went out and got access the database and we found out housing prices and we found the number of bedrooms in those houses, and we put this together. There's a public data set called the Boston data set, and it has a square footage and number of bedrooms and the price, it has a whole bunch of other socio-economic data in there as well, and I always meant to circle back and run this on the Boston data set, and there was just too much to do. So again, I never have enough time in the day. So, we have some examples, we call the training data, in each row is a sample from real-estate, contains a number of features X sub n. So, each one of these things is called a feature, except for the last one in the right-hand column. So, X_1 and, notice not X_0, and I get to explain why that is in a moment. So, my first feature X of,1 is a square footage and X of 2 is the number of bedrooms, and the output Y is the price, and that's what we want the model to predict. We're going to train the model on this data, and then we're going to give it a data it's never seen before, and we'll see how it does, making predictions. This output is also known as a target value. So, I plotted, I used Matplotlib, and plot my training data. So, your square footage along the X-axis as one of the independent variables, and number of bedrooms are sticking out this way. I tried for a long time to figure out how to plot multi-dimensional data and I failed miserably, and I was looking at some packages, trying to do a multidimensional data display. But, for our example, we can just imagine the number of bedrooms are sticking out here and some of the data points had two up to four or five bedrooms, whatever it was. So, I just plotted square footage of the house versus the housing price, so we can see it has this tendency to trend upwards and to the right. Now, I didn't want to do machine learning. We could just draw a line between the two endpoints or do a best fit curve on there, calculate the slope, and use the slope point form of an equation and just take some new X value and compute a Y value, but that's not nearly as interesting as having a machine-learning algorithm do it for you. In this case, that trend happens to be linear. So, we want to create a hypothesis function, H, that will make these predictions of housing prices. To perform supervised learning, we have to decide how we will represent the hypothesis function in here, and begins, humans deciding how they're going to model this hypothesis function, and we have a lot of choices as you'll see. So, as an initial choice, let's say we decide that to approximate our hypothesis function as a linear combination of features X and weights theta. So, we have h of x of theta is equal to theta zero times x_0 plus theta 1 times x_1 plus theta two times x_2. Then remember, X_1 is square footage and X_2 is the second feature, is the number of bedrooms. We set x_0 to a one, I believe. Let's see what I set. Yes, we set x_0 to a one, and then theta zero becomes like an intercept term, much like in the equation of a line, Y equals AX plus B, where B is the intercept term. Okay. So, we have all our features, so this would represent all the features of what was in a row with the exception of output value, and we express them as a column vector, and we had all of our associated thetas expressed as a column vector, okay? M is the number of training examples, up there was 16, so it's 16 rows in my table, and the number of features is equal to two. We can rewrite our hypothesis function as the summation as I goes from one to n of theta sub i x sub i, which is equal to, these are now matrices, theta transpose multiplied times x. So, we take their column vector, and do a linear algebra on it, which hurt my head when I had to review all this, and we take this times this plus this times this plus this times this and et cetera et cetera, and we produce a real number. We get a real number out of our hypothesis function.