We now switch our attention to the X-bar and R-charts. The X-bar and R-charts are used when measurements are continuous, so we are measuring some physical dimension, for example. Sample size that we are looking at is somewhere between two and 10. As I have said earlier, larger than 10 may also be possible and we may use an X-bar and R-chart for larger than 10. However, it is better to use an X-bar and S-chart in that case. So, let's understand the thinking behind the X-bar and R-charts. The X-bar and R-charts, unlike the PNC charts that we looked at before, the P-chart and the C-chart were single charts that were used for either the proportion of defectives, or the number of defects. When we come to the X-bar and R-chart, this is a pair of charts that we use together. Now, why do we need a pair of charts when we get into these continuous measurements? Here's some data that we've collected. So, if you look at this particular graph, what you will notice is that we have a number of samples. We have six samples that we've collected, and each of those six samples, has five items in each sample. So, the value of the measurement for each of the items in the sample is shown by the little blue dots. So, what we notice is that each sample is spread out, somewhat. If I calculate the average for each sample, I get the points that are shown by the little x's that lie right in between each of the sentence. The thing that I need to look at when I look at this particular figure is that I notice that some of the samples are tightly clustered which means, the value, the smallest value and the largest value, isn't very different. So, if I look at the second sample, for example, I see a fairly tight cluster of points. On the other hand, if I look at the third sample or the fourth sample, I see that my sample values are spread out extensively. Now, if I were to only use the average value from each of these samples and use those values to calculate my control chart, I would miss this whole problem that I'm noticing with this data where some things are tightly clustered and other things aren't tightly clustered. So, I have variation that is occurring here, which is within each sample, and I have variation that is occurring between samples. I have to have a way to distinguish between this so that I can be careful that I'm sure about whether the differences that I'm seeing are because of within sample differences or between sample differences. So to do that, I construct two separate charts. So what are these charts? So, we use the X-bar chart, which is a plot of the sample means. So, if I had taken the Xs in the previous figure and I only plotted those Xs, that's what the X-bar chart would be. The X-bar chart is a way for me to evaluate process averages, assuming that I have stable process variability, which means that the samples are not too different from one another. So, the spread of the samples is not too different from one another. But I don't know that the spread is different or not. To be able to figure that out, I construct what is called the R-chart. The R-chart essentially looks at the range of values. The difference between the smallest and the largest value. It uses that to figure out whether the process variability is indeed stable. If there is a lot of variability from sample to sample, then I do not have a stable process variability, and I need to fix that before I can say anything of using my X-bar chart. So this is why we have these two charts. So let's look at the X-bar chart first. We have k chronologically ordered samples of equal size, each one is size n. For each such sample, I calculate the mean of the sample, or the average value of the sample, and I denote it by X-bar i. The bar signifies that I’ve taken an average, the i subscript signifies that this is the average for the ith sample. Then, I take an average of the averages. So, that's why since I'm taking the average of averages, I call it X double bar. So I put two bars on top to signify that it is the average of averages, and then it's simply the, I will take the k different, X_i bar values, sum them up, and divide by k to get an average of averages. This X double bar is then the center line of my X-bar R-chart. Now, to find the control limits, I need to know standard deviation. Now remember, in the old days, calculation of standard deviation which requires calculation of a square root, was a very expensive, tedious, time consuming process, and having to do this repeatedly for every operation was prohibitively expensive. So, the alternative was to use the range. Now, the range is simply taking the maximum value and subtracting from it the minimum value in the sample to figure out what is the difference between the maximum and minimum. Now why the range? For the normal distribution, it was shown many years ago that there is a relationship between the range of a sample and the standard deviation. So using this relationship, we can estimate the standard deviation by looking at the range. So, you don't actually have to take the square root sign to do the calculation. A second thing that happens is that if you have a very small sample, a sample of size two, for example, then we do not have enough what are known as degrees of freedom, and because we do not have enough degrees of freedom, the calculation of the standard deviation or the sample standard deviation, poses some statistical problems. To avoid that, we try and use the range and then do the conversion to a standard deviation. In any case, if you are going to calculate the range, we call the range of the ith sample R subscript i. We then take the KR_i values, one for each sample, and calculate the average of those k values and call it R bar. So, this is the mean range of the k samples. So once I have the mean of the ranges of the samples, I can calculate the lower control limit and the upper control limits for my X-bar chart, as being X double bar minus A_2R bar, the upper control limit being X double bar plus A_2R bar. The A_2 there, A subscript two there, is a constant that depends on the sample size and the standard deviation that we are interested in. So, we are typically interested in three standard deviations, plus or minus three standard deviations, so that three of the three standard deviation is buried inside that A_2 along with the conversion factor from range to the standard deviation. Now, for the R-chart we have to do something very similar. We first calculate R-bar, which is the center line as we did before. Then, we calculate the lower control limits and the upper control limits. This time, to find a lower and upper control limits we are given constants D3 and D4. These are constants that again, depend on the sample size. So, what are these constants A2, D3, and D4? So, these constants help translate the range information that we have to our control limits. They depend on the sample size n. Since we've said that we want to use X-bar R-charts for sample sizes from two to 10, I'm providing a table with the constants associated with each of those sample sizes. So for example if I have a sample size of three, I will use an A2 value of 1.023, a D3 value of 0, and a D4 of 2.575. So again depending on the sample, we notice that the values of the constant change and accordingly the translation from the range to the standard deviation or estimate of standard deviation, changes a little bit. So, let's take an example to figure out how we can go about using the formula that we've just learned. So in this example, Yolanda is studying the weight of beverages in a 10-ounce can coming off a bottling line. Now, it is important that the amount that's put in that can closely matches the specification of 10 ounces. If you put too little then that shortchanges the customer, and if you put too much, A, it adds extra cost, and B, it could lead to overfilling of the cans which may cause them to break, and they may break while transporting or were in storage and that's not something that you want. To study this particular process, Yolanda collects five samples each hour during an eight-hour shift. So, her sample size is five, and she has eight such samples that she has collected. The data she's collected is shown here. So, notice that she has eight samples and then for each sample, there are five values associated with the weight of what's put inside a can. So if you look at sample one, the five cans that she observes had weights of 10.833, 9.976, 10.002, etc. Once we have these values, we can calculate the average of each sample. So, the average of each sample is shown in the table as X-bar_i. So for the ith sample, so if I look at the second sample, then the second sample has X-bar subscript two becomes 10.109 which is simply the average of the five values in that particular sample. Now, similarly I can calculate the range associated with the observations. So if I look at sample one, then the largest value in that sample happens to be the first value 10.833. The smallest value happens to be 9.515, and so if I take the difference, the difference of those two values it turns out to be 1.318. Once I've calculated the mean values for each of the samples and the range for each of the samples, I can go ahead and take the average of those values. So when I take the average of those values, I get X double bar, which is the overall average, which turns out to be 10.172, and I can find the average of all the ranges R bar which turns out to be 0.944. Now, I use these values that I've just calculated and I use them in calculating the lower and upper control limits. So, for the X-bar chart, I need to figure out what value of A_2 I'm going to use to figure out the lower and upper control limits. So remember that the sample size, in this case, happens to be five. So, I go and look at my table of constants and I find that for a sample size of five, A_2 has a value of 0.577. So now I can calculate the lower control limit as X-bar, which is 10.172 minus A_2, which is 0.577 multiplied by R-bar, which is 0.944. When I do the calculation, I get 9.627. Similarly, when I do the calculation for the upper control limit I get 10.716. For the R-chart, I again have to do something similar, I have to find out the values of D3 and D4. For a sample size of five, the chart tells me that the value of D3 should be 0 and the value of D4 is 2.115. So, I take those values and multiply them by R-bar to give me the lower and upper control limits, which in this case, turn out to be 0 and 1.996. So the first thing that I do is I check whether my process variability is stable. To do that, I plot the R-chart. In the R-chart, I plot the lower control limit, the upper control limits, and the center line, and then I plot each of the eight observations. When I plot each of those eight observations, I notice that all of them lie within the upper and lower control limits of the range chart. So, I can now declare that, or at least I can feel fairly comfortable, that I have stability in my process variability, so my process variability is not changing too much. Having done that now, I can go ahead and use the value of R-bar that I have calculated in the X-bar chart. Once again, I plot the center line, X-double bar, I plot the lower control limit and the upper control limit that we calculated earlier, and then we plot the individual data points X-bar i. What we notice is that most of the points lie between the lower and upper control limit except one, and this one point happens to be the point sample five which exceeds the upper control limit. Since it exceeds the upper control limit, it's an outlier and we now need to investigate it further. So, Yolanda observes that this hour five sample lies outside the control limit, and when she investigates, she found that there was an unusual problem with the electrical system that caused a voltage fluctuation which affected the filling valves, and that's why there was a problem in hour five. Since now there is an assignable or special cause that we can point to, we can drop that particular observation. When we drop that particular observation, we have one less observation. So if you notice in the table, the fifth sample has been removed or we strike it out, and so we will not use it in the rest of the calculations. We recalculate our X-double bar and R-bar, we get slightly different values, X-double bar is now 10.07, and R-bar is 0.82. We go ahead to the exercise of recalculating the control limits. Notice that our sample size has not changed even though the number of samples has changed, so that we still have A_2 equal to 0.577. So by plugging in the new values of X-double bar and R-bar, we get lower control limits and upper control limits for the X-bar chart, and similarly, we get lower control limits and upper control limits for the R-chart. Now we go ahead and plot the two, and what we notice is that the R-chart is in control, it was in control before, and it still remains in control, there are no outliers in the R-chart, so we can be fairly comfortable that process variability is stable. We then go ahead and plot the X-bar chart and notice that all the values, the seven values that we have for the X-bar chart, lie within the control limits, and so we can use these control limits going forward to plot new observations as we take new samples. So, what we've shown in this case is how to go about plotting the X-bar and R-charts. What to do if we see outliers and how do we deal with the outliers? Now, in this particular example, you learned I was able to look at the outlier and figure out the reason for that outlier. It's possible sometimes that no assignable cause is available. In which case, it doesn't make sense to use the upper and lower control limits that we have calculated because we can't throw out this data point because we don't understand why we got this data point. So in that case, since our process is not in control, we may be required to take more samples or take a fresh new sample and redo the calculations. Similarly, if the R-chart was out of control, then we couldn't use the X-bar chart. So we would then have to figure out a way to make sure that the R-chart was in control either by taking more samples or increasing the sample size to get an R-chart that did not have outliers so that we could be comfortable that the X-bar chart was accurate and that reasonable deductions could be made from the X-bar chart. As we saw for this particular example, Yolanda has been able to observe that the process is under statistical control, that the average value of X-bar of 10.07 was slightly higher than the nominal for a 10-ounce can. Remember the can is supposed to have 10 ounces, the average we are observing is 10.07. What we also observed is that most cans will likely have contents which are between 9.5977 and 10.543 ounces. Now, the thing is this all makes sense in a statistical sense in that we have statistical control over our process. But, is our customer happy about this? Does our customer think that getting a can which is between 9.597 and 10.53 are reasonable things? The customer might say, listen, I want to have a can which is 10 ounces plus or minus 0.01 ounces. So now, we are going to give them cans that are either much lighter or much heavier and the customer may not like this. So, the question we have to ask often is even if we have a process that is in statistical control, does this process meet our customer's expectations?