And then, we do that for every single point of data that we have.

And then once we have all these distances, we then square them.

Thus, the name least squares.

Now you might say, hey, those aren't squares those, are rectangles, and

that's because our axes aren't scaled the same way.

So if these axes were actually the same,

these would be nice little green squares instead of green rectangles.

Anyway, back to the method.

It's called least squares, because we're trying to find a line of best fit

that minimizes the sum of all of these areas.

And that's pretty much how ordinary least squares works.

But now, it's really important to point out that linear regression isn't great for

all types of data.

Say that we have data that tracks the temperature every hour of the day for

a certain geographic location.

If we were to apply a linear regression to this data,

we can totally get a line of best fit.

But as you can see, this line of best fit doesn't really fit this data very well.

And while we could find a prediction of temperature given an hour in the day,

we can see that it's not going to be very effective.

And that's because linear regressions work on data with linear correlations.

Sometimes determining this is pretty obvious, like in this visualization,

where you can pretty clearly see that there's a linear correlation.

But especially when you have many different independent variables that

you're trying to do a linear regression on, it's not going to be quite this easy.