[SOUND] Sometimes, you are interested in relative position of value in relationship to other values in datasets and we can use percentiles for this. Percentiles represent the approximate percentage of values in a dataset that are below a certain value. For example, you may be interested in taking a new job and your offered a salary of hundred thousand. How do you know if the firm is giving you a good offer? I'm pretty sure that you will be happier with this number if you know that your offer puts you at the 95th percentile, which would mean that you're getting paid better than 95% of others with the similar job titles. You may not be as excited if you knew your salary is at 25%, which means that 75% of people get higher pay than you. So percentiles, very quickly will inform you about the position of the value you are studying. How do we calculate percentiles? We already know something about this. If you recall, we talked about median, which is the value where 50% of the data points are higher than that and 50% of the data points are less than that. So, median is 50th percentile. To find the median, we order the dataset in order and point in the middle was the median or the 50th percentile. So to find any percentile, we do the same. We order the dataset for a given percentile, let's say 10%. The data point that has 10% of the observation below it is the 10th percentile. Consider this example where we have data for ten individual salaries, for people with similar job titles as you were getting hired for. I have all ready ordered the 10 data points in ascending order here. The 10th percentile is between these two values where one value is less that it, which in this case is the 10% of the value and nine values are more than it, which is 90th percent of the values. The 50th percentile is here and your salary of 100,000 will fall here, which is close to 60th percentile. There are different ways of calculating percentiles, but all methods will give you results that are close. For now, I would like you to understand what percentiles are telling us, which is position of a given value in relationship to other values in datasets. When we have large sets of data, then we have another way of finding the position of a specific observation. This is known as the Z-score. By using the Z-score we can find the proportion of data points that are less than a specific value. The z-score is calculating using this formula. Here x is the variable of interest. Mu is the mean for the population and sigma is the standard deviation. For example, if you were graduating from a college with a degree in business. You get a job offer and would like to know how you offer stacks up against others. All other peers from colleges get job offers similar to yours. If all you knew is the median, then you can tell if you're in the top 50% or the lower 50%, but are you at 51% or 95%? We can calculate the position of your offer by calculating its Z-score. Now, let's expand on this example we were using before. You interview for a job and get an offer, and would like to know how competitive is your offered salary. All you have is some data based on published reports on websites. Let's look at such websites, I'm using payscale.com. Searching for a job title such as business analyst gives the following information. The median is 54,030 and the standard deviation, about 8,900. You get an offer of 60,000. This is above the median, but how does the salary stack up compared to others? To know where the salary falls, you can determine its percentile. To do this, we need to first find out how many standard deviations the offered salary is from the mean or the median? This is known as the Z-score and it's calculated by taking x minus the mean and dividing it by the standard deviation. For you, the x is the salary offered to you of 65,000. An average salary is 54,030. Standard deviation, 8,600. This means you received an offer, which is 1.27 standard deviations above the mean. How can we tell where we stand in comparison to the rest of the people getting such offer? Is the salary in the top 70%? Top 99%? What exactly could it be? Now, let me share with you some very useful properties of a bell curve. For large datasets, we often observe that many values cluster around the mean or the median. So if then we create a histogram of the data, we get a distribution that represents a bell shape, a symmetrical curve. When this happens, according to the empirical rule, 68% of all observations will fall within one standard deviation of the mean. 95% will fall within two standard deviations and 99.7% of all observation will fall within three standard deviations. Knowing this rule of one standard deviations, two standard deviation, three standard deviation. Known as the empirical rule will always be a handy way of figuring out where an observation of interest falls in comparison to the mean or the median. So given a specific observation, the offer we receive and knowing something about this type of job starting salaries. We were able to find our relative place in the group. The Z- score, when positive tells you that the value is above the mean. And when it's negative, it's below the mean. Furthermore, it tells you how many standard deviations are you above or below the mean. So the salary offer is between one and two standard deviations above the mean, roughly around here. So certainly, the salary receives somewhere in the 70th and 95th percentile. Not bad. In later lesson, we will learn to know the exact value. But for now, we can use the empirical rule as a rough estimate and this can be very important. Because many times, you are in a meeting where you are being shown a lot of statistics. And if you want to use your judgement to evaluate the validity of what is being recommended or ask followup questions and don't have access to a computer or calculator using this understanding about being one, two, three standard deviations away from the mean and the probabilities can provide you with a quick insight. The graph you see here is from payscale.com for the salary of business analysts. It is showing the values for the 10th percentile 25th percentile. Me, 75th percentile and the 90th percentile. Looks as though, the offered salary for you is closer to the 90th percentile. Calculating the Z value of an observation and knowing about the empirical rule helps us determine what percent of the data is below the value of your examining. TrueCar commercial addresses the question of how do you know if you're paying your fair value for a car you're about to purchase? In their commercial, they do this by showing you a visual of their range of prices people in your area have paid for the same car. The mean is of the good market value and the tales of the curves show below, and above the market value. Of course, you would happy about your purchase if your purchase price is falling in the left side of the curve, below the mean and toward the good buy. As and example, I have asked for a price of a new Camry XLEV6 near my office area where I'm likely to go shopping. This is what I get. They have data on 201 sales in my area for a type of a car I'm interested in. The bar graph is the actually histogram for price ranges, then a normal curve is superimposed on this. The mean of the curve appears to be at the factory invoice of 30,494. Looking at this, most people are paying less than this value. Maybe this is to make sure that everyone leaves the lot happy about the bargain they received, but the actual sold price data shows average selling price to be around 29,000. Furthermore, you see the use of empirical rule without being mentioned, of course. To create the categories of good price, great price, above market, exceptional value. I'm showing you this graph to illustrate the fact that often times in business, the absolute value such as the price you pay for your car has a lot less meaning than how did the price you paid for a car stack up against others. This is about the relative position of the value of interest. Again, here that will be the price you paid for your car in the entire data set available. When your purchase price is about two standard deviations below the mean, then you have paid exceptional bargain compared to the rest of the people who purchased the same type of car. Staying with the car example, the car you buy has this sticker on its window. This is a fuel economy label. Every new car sold in the United States is required to have a fuel economy label, which shows the miles per gallon estimates. This is meant to help consumers compare and shop for vehicles. So the best in this category is 32 miles per gallon, that is the 99th percentile and the worst mileage per gallon is 16, then how does the car you're considering to buy with 26 miles per gallon stack up compared to the rest of the vehicle in this class? This car has an average of 26 miles per gallon where the range is between 16 and 32. So, the midpoint is a rough point estimate of the average for the car in this category and that's about 24 miles per gallon. We can also get a rough estimate for standard deviation based on the empirical rule. That is 99.75% of all observations will be within plus or minus three standard deviation. Thus, a rough estimate for the standard deviation is the distance between 32 and 16 divided by 6. Remember, we have three standard deviations on both side of the mean for a total of six. Using this, we will get standard deviation of 2.67 miles per gallon. So this particular car that you're concerning is above average and is roughly, 0.75 standard deviations above the mean. Now, remember that 68% of observations fall within one standard deviation shown here in red. The remaining 32% of the observations are distributed symmetrically outside of the plus and minus one standard deviation shown in blue. So, a very rough estimate would be that the card we are considering to buy is close to an 80th percentile. It is one standard deviation higher than it would have been. 68% plus 16 would have given you 84 percentile. So, I went a little less than that since we are only 0.7 standard deviation from the mean. So as you can see, knowing the Z-score can help us determine how to place a particular observation within a population. To most of us, when making a decision, knowing this placement called sometimes in everyday language a percentile is much more meaningful than the actual value such as miles per gallon as was in this case.