[SOUND] So here's the example. The categorical data like major has more than two possibilities. Here the adviser had broken the majors to four possibilities, business, engineering, liberal arts and ag. So if you can code a student as either business, engineer, or liberal arts and ag, all you need to do is to just pick three. And again it doesn't matter which one you pick. And in this case I pick business, engineer, and liberal arts to code and I left ag out which means all of our data will be compared to students who are left out, and those will be the students who have majored in agriculture. So if a student has a value for one in engineering, it means that they are an engineering student, if they have a value for business, it means that they are business students, and as you can see the rest of it would be zero here. We have assumed that they are not doing double majors in business and engineering, and so on. And we would have students like this one who has zero for all the variables so this person must be an ag major. So if they're not an engineer or business or liberal arts, the only thing remaining is that they're ag. So whenever you have a categorical variable you need to take the number of possibilities minus one and include that many dummy variables in your models. So here we have four majors, we need four minus one or three dummy variables. And then we code it as zeros and ones, and then you are ready to run the problem. So, we will go to data, you will go to data analysis, pick regression, input your y-value, prediction is for the starting salaries so the y-value is right here, I'll pick all of the data that I have and then input is the GPA as well as the person's major and I pick all of that. I make sure that it knows that I have labels included and then I say OK. And here's our analysis. First of all adjusted r square is of significantly higher than what it used to be. So 82.4% of variations in the starting salary of these graduates can be explained by their GPA and their major. So now let's look at the variables that we have and make sure that every single variable is significant in our model and for that I will go to the last table. So looking at the P value I see that all the values are less than 0.05. So every variable that we have identified GPA and major has a significant relationship with a starting salary, what we are trying to predict. And now we are ready to make the prediction. So let's take these values like I like to do, copy them down here and now I'm going to say student number one, student number two, number three and student number four. I am going to predict point estimate of prediction, so I'm going to make the prediction here for each type of student. So let's say this student has a GPA of 3.6 and is a engineering student. So that means they are not business or they are liberal arts, so this is all I need to provide. So I'm going to do the prediction right here which is my intercept, and I'm going to copy my formula so I'm going to lock the cells for the coefficients for my model. So I'm going to press F4 to lock it. Plus, and then I'm going to use the function I told you about, SUMPRODUCT. So I'm going to pick SUMPRODUCT and SUMPRODUCT is going to take these variables, remember from GPA onwards, and I am going to say F4 again to lock them. Second array is basically what is here. And if the cell is empty, it is just going to replace it with zero, so you are not going to get any problems. This is the value we get. Okay, let's now focus on this column, and then I will change the value so you can see how these coefficients will matter. So let's assume the next student has a GPA of 2.6 and still an engineering student. What would their expected salary be? So if I just drag this out you would see that their salary difference is what the GPA coefficient is. So for every complete point that it drops the students salary goes down by $1142 so that's what you see the difference between these two be. So if I just quickly do that you will see that that's the difference you will see exactly what you would see as the coefficient. So what if the student is a 2.6 but is a ag major so if it's an ag major then it doesn't get any coefficient again multiplied by one. So it's just this value, so I can just drag this out and you would see the difference between an ag major and a GPA of the same engineering major there is a difference that you would see as your engineering coefficient being removed. So again if I look at the difference between an engineering student with a 2.6 and a student in ag, you would see that you would get the value. The engineering student had this additional value added to that number. What the ag major is getting is that intercept plus the bonus they get for their GPA. If I take this value and now say it's 2.6 and the student is a liberal arts major, what you would see is, again the coefficient of liberal arts is going to reduce this person's salary by $2000.32. So, again, if I look at the difference here, it's the difference between this value and this value. Remember, everybody's being compared to a ag major, here. So, this is what you would get. So this is how you would use it. Now, of course if I have someone who has a little higher GPA, so let's say this person is now 3.6 their value would go up and if it’s a 4.0 the value will go up also, this is how we can use regression model for prediction.