Recall that linear regression is used to describe a straight line that best fits a series of ordered pairs (x, y). The equation for linear regression is y hat =a +bx, where y hat is the predicted value of y given the value of x. x is the independent variable, a is the y intercept of the straight line, and b is the slope of a straight line. The least squares method is a mathematical procedure to identify the linear equation that best fits a set of ordered pairs by finding values for a, the y intercept and b, the slope. The goal of the least squares method is to minimize the total squared error between the values of y and y hat. The procedure for calculating the regression line using the least squares method is as follows. 1, create a table with your x and y values in columns. Number 2, calculate xy, x squared, y squared values and enter them also into the table. Number 3, calculate the sums for x, y, xy, x squared, y squared, x bar, and y bar. And number 4, find the linear equation that best fits the data by determining the value for a, the y intercept and b, the slope, using the following equations. b = n sigma xy- the product of sigma x times sigma y, all over n times sigma x squared- quantity sigma x squared. And then the other equation is a = y bar- b x bar. Okay for example, we want to use the least squares method to identify the linear equation that best fits a set of ordered pairs, x and y. The first two columns are collected x and y data. The other three columns contain results of either squaring x, multiplying x and y together and squaring y. Using the values from the table, we can calculate the value of b or the slope of the line, and then the a value or the y intercept of the equation. Next we plot those numbers into the regression line equation, y hat = 6.50015 + 0.5833x. Okay, we want to test for statistical significant using the t-test to see if this new predictor equation is useful. The steps are, number 1, the initial conditions for the t-test is that the population regression equation is y=mx+b, and that for a given specific value of x, the distribution of y values is normal and independent and has equal standard deviations. Number 2, decide the significance level, alpha equals 0.05, 0.10, 0.001. We will use 0.05. Number 3, develop a hypothesis to be tested. H0 equals beta sub 1 equals 0, or the equation cannot be used as a predictor of the y values. And then it's alternative, H sub 1 is beta sub 1 is not equal to 0. In other words, the equation is useful to predict y values. Continuing, step 4, the critical values are obtained from the t table using a appendix in the back of your textbook, or you can look it up on the Internet. Look up the critical value for +- t sub one-half alpha, and use n-2 degrees of freedom. Number 5, the test statistic is given by the following formula, t = b sub i over s divided by the square root of s sub xx. Number 6, for step 6, compare the test statistic with the critical value obtained in step 4. Reject the null hypothesis if the test statistics is greater than t sub one-half alpha, or the test statistic is less than- t sub one-half alpha. Otherwise, do not reject the null hypothesis. And then number 7, state the conclusion in terms of the problem context. From our complaints example, we found that r equals 0.972. So let's test this for statistical significance. Our null hypothesis is H0 = beta 1 = 0 or the equation cannot be used as a predictor of the y values. And the alternative hypothesis is H sub 1 is beta 1 is not equal to 0, or the equation is useful to predict the y values, okay? Let's calculate the test statistic t. From the calculations, we get t = 0.348. The critical t-statistic t sub c = +- 2.447 from our table based on the degrees of freedom of 8 minus 2 or 6, and alpha over 2 is 0.025, and it's being a two-tailed test. So now let's compare t = 0.348 and t sub c = + or- 2.447. If you'll recall from our hypothesis test, we had H0 equals beta sub 1 equals 0 and we have the alternative, H sub 1 is beta sub 1 is not equal to 0. So comparing t=0.348 and t sub c = + or- 2.447, we can see that 0.328 is well within the boundaries of + and- 2.447, so we do not reject the null hypothesis and conclude that this equation is not a good predictor of y values. You can also use Microsoft Excel to calculate the regression. Go to the tools menu and select Data Analysis. If you don't see this option, you must download as an Excel Add-in. There are tutorials on the Internet that will help you. Number 2, select Regression from the list. Number 3, in the Regression input box, enter the x and y ranges. And number 4, view the results. Here's a screenshot from Excel. Once you select the data tab and data analysis tool, choose regression. Fill in the y range and the x range and leave the 95% significance level. Select the location for your output and hit okay. Here's the way the output looks like in Microsoft Excel. The results are consistent with our manual calculations earlier. The regression equation is y hat = 6.5 + 0.5833x. And the p value is greater than the 0.05 alpha value, so we do not reject the null hypothesis.