Okay. Let's talk about Parametric versus Non-parametric. A statistical inference is about finding the underlying data generating process of our data, then the statistical model is going to be a set of the possible distributions or even aggression that data can take. Now, a parametric model is a particular type of statistical model. What differentiates a parametric model? Some of the major characteristics, or that a parametric model is constrained to a finite number of parameters, and that'll rely on some strict assumptions made about the distributions from which that data is pulled. Now, non-parametric models, on the other hand, will mean that our inference will not rely on as many assumptions, such as it will not have to rely on the data being pulled from a particular distribution, it'll be a distribution free inference. Now, this doesn't mean that we don't know anything though, we will be using insight from the data that we have available. So starting with non-parametric statistics, an example of a non-parametric inference is creating a distribution of the data using a histogram. So you decide the CDF or the Cumulative Distribution Function as to the probability of where a certain value will fall according to the actual data. So you're not assuming a normal distribution or exponential distribution, which we'll discuss later, but a distribution defined by the actual data that you've pulled from your sample. In this case, we wouldn't be specifying any of the parameters that we need for, let's say, a normal distribution. Just as a reminder, a parametric model is a particular type of statistical model where they will have a finite number of parameters. So we can think here with ordinary squares with our linear models, where what we saw is that we had to predefine the number of coefficients according to the features or transformed features, that we were working with, and we had to assume also a linear distribution. So that's going to be parametric and having the constraints that come along with it. So sometimes those will be a bit easier and quicker to solve, but on the other hand, they're going to be constrained compared to non-parametric models which won't have similar constraints. Let's look at a particular parametric model, namely the normal distribution. We see that there is a set equation for the normal distribution, and it's going to depend on a set number of parameters, namely the mean, and the standard deviation, and that's essentially going to be the parameters that will define what our normal distribution actually look like, with the assumption that we were using the normal distribution. Now, time back to a business example. Customer lifetime value is going to be an estimate of the customer's value to the company over time. The data related to customer lifetime value will probably include something along the lines of the expected length of time that the person will be there as a customer, as well as the expected amount that that person spends over that, a lot of time. So to estimate lifetime value, we need to make assumptions about the data, namely, how long do we think customers' going to last as well as how much do we think they will spend over time? These assumptions can be parametric. So assume a specific distribution, whether that's linear over time or some type of decrease over time, or non-parametric, where if you're doing non-parametric statistics, we'll be relying much more heavily on the data, and we'll need a lot more data in order to come to a conclusion. Now, one more doing parametric modeling, the most common way of estimating parameters and a parametric model is through the Maximum Likelihood estimate. The likelihood function is related to probability and is a function of the parameters of the model. So for the normal distribution, those parameters were the mean, and the standard deviation. The idea being here, that our likelihood function is going to be a function of the parameters. To make this clear, think about the likelihood function as taking all of your data and saying, "What is the most likely value for the mean and the most likely value for the standard deviation given the sample data that we see? " So for the population, what is the most likely parameters given our sample? That's going to be your maximum likelihood estimate. Then we choose the value of each of those parameters that's going to maximize that likelihood function, that's going to maximize what is most likely to occur given the data that we have. Now let's talk about common distributions that we will be using and that you'll see in the real world. Here we see the uniform distribution. Uniform because it is a uniformly equal chance that you'll get any value within our range. You can think here, the chances of rolling a dice where a 1 is equally likely to 6 versus 3 versus 4, every single value is equally likely. Next is the normal or Gaussian distribution that we've seen already and is very common and popular within statistics. The idea being that the most likely value is going to be those values that are closest to the mean and those that are further out on either side are going to be equally unlikely the further away we move from the mean. So we see here different examples with different means defining where they are on the graph, as well as the different standard deviations defining how spread out or pointed that curve is, where lower standard deviations means a tighter normal distribution and the point to your curve. Now, one of the major claims to fame that makes the normal distribution so popular is the central limit theorem. So what is the central limit theorem? The idea is that if you take the average value from a bunch of samples, so you have a bunch of random samples and you take the average value of each one of those random samples, the distribution of those averages is going to be a normal curve if you have enough values. Also, in the real world you see this in examples such as height, where most people are going to be close to the average height and it's very unlikely to be at the extremes of the averages around 5'5, 5'6, will be very unlikely to be seven feet or two feet tall, but being 5'7 or 5'4 is much more likely. Next is the log-normal distribution. The idea being that if you took the log of this variable, then you'd have the normal distribution. We saw this before how we'd take skewed data, take the log transformation about data, and end up with a more normally distributed data set. Now if we see, the tighter we are around the mean values in regards to the standard deviation, the closer it looks to the normal curve. My idea being, if you think about there being large outliers, then will have a larger tail, and we will have a bigger standard deviation because it's more spread out and therefore you will be further away from normal. So that's the idea of a smaller standard deviation being closer to normal when we look at this graph. A common place that you'll see this in the real world is most times when you're dealing with money, such as with the household income where most people are averaging around, let's say the left side, closer to 60,000 of a median income and then you'll have large outliers out to the right, the billionaires, so on and so forth, creating this long tail to the right. Now we have the exponential curve. The exponential curve, you'll have most of your values closer to that left side and the idea is that it'll often be used to say what is going to be the amount of time before the next event. So for example, the time between when you and someone else ends up watching this video. So the time will be, let's say one minute and then assume someone else watches, then you restart and mostly people are around one minute, whereas at some point there's this long spread out, some type of break that's less likely to happen where it takes 10-15 minutes before the next person watching this video. Now, we have the Poisson distribution. A good way to think about this in the real-world is the number of events that happened during a certain amount of time. So the number of events that happened during certain amount of time, we have the Lambda, which is both the average value and the variance value of the distribution. We can think here, an example is how many people are going to watch this video in the next 10 minutes? If Lambda was one, then we'd say most of the time there's only one person that watches every 10 minutes and it's tight around that one. But if it's something like 10, people watch it every 10 minutes, then you would probably have more of a spread of your standard deviation because it could be closer to five or 15 when you have a larger value, and that's going to be your Poisson distribution.