Let's start with an example.

Imagine, you work at a factory that produces tea and you work for

the packaging line where the tea is put into tea bags.

Sometimes the tea bags break during the packaging process and they become defects.

You wish to reduce the number of defect tea bags.

You are brain storming with your team for influence factors.

You know that different tastes of tea are produced and therefore,

the production line is stopped during the day for change overs of type of tea.

And this makes you wonder, could it be that there is a relationship between

a number of production shops and a number of defect tea bags?

Therefore, you collect data on both these variables.

To do study the relationship, you first look at the graph of this data.

As our Y variables defect bags which is called bags in the data set and

our X variable stops, and because they are both numerical,

you can make a scatter plot to visualize the relationship.

Let's study the scatter plot.

Do you see a relationship between stops and defects?

On first sight, it looks like the number of broken teabags

increases as the number of production stops Increases as well.

We can visualize this relationship by drawing a line through the data points.

But which line is the best line?

Is it this one?

Or this one?

Or maybe this one?

Regression analysis is about finding the best fitting linear line for your data.

A line can be described mathematically

by the formula y = a + bX.

Regression analysis means finding the a and

the B that give you the best fitting line.

Regression analysis consists of four steps.

The first step is making a fitted line plot to visualize your data.

The second step is to perform the main regression analysis.

At this step, you look whether the relationship is statistically significant,

and if so, how strong the relationship is.

The third step, you perform are residual analysis.

I will explain what residuals are and

you will learn that the reliability of the regression analysis depends on this step.

In some cases, you perform regression analysis,

not only to study significance but also to do predictions.

Therefore, we have added the optional fourth step

of constructing a prediction interval.

How you should perform these four steps will be discussed in the next videos.