0:10

Hello. This lesson will introduce you to the calculation of probabilities,

Â and the application of Bayes Theorem by using Python.

Â These are very important concepts and there's

Â a very long notebook that I'll introduce you to in just a second,

Â but I've also provided links to two web pages that provide

Â visual introduction to both basic probability concepts

Â as well as conditional probability concepts.

Â So first, let's take a look at these websites.

Â First, the basic probability.

Â This has three different parts.

Â The first talks about likelihood,

Â the second about expectation and the third about estimation.

Â And the easiest thing to do is to just show you.

Â Likelihood talks about how to measure probability of an event to occur,

Â and this does it by doing simulations.

Â So you can see I can flip coins,

Â I could also flip them a bunch of times,

Â and you notice how it randomly generated data.

Â Here's our theoretical expectation, a fair coin,

Â we should have half heads half tails,

Â but the more we do it,

Â we get random results.

Â This is a key concept that,

Â when you're sampling from nature,

Â you're often getting random results.

Â You're not getting a uniform prediction.

Â And that's one of the fundamental concepts in probability and

Â something that you really need to work with to make sure you understand properly.

Â Next is the idea of expectation, where,

Â we may have a specific idea of what the result should be.

Â So, for instance, if we roll the dice,

Â the average should be the average of one, two,

Â three, four, five, six,

Â or three point five.

Â So if I roll the dice once,

Â we get a single value,

Â if I roll it again, we get another value and you

Â can see how this line is appearing on the left.

Â This is giving the average of all of our rolls.

Â If I roll it a hundred times,

Â you'll see that we get this long term frequency.

Â And as we increase we're going to get

Â hopefully closer to our expected theoretical average.

Â But there will be deviations,

Â because again, this is a random variable.

Â And so the process itself is random.

Â Next is the idea of estimation.

Â Here we can actually sample from our data set,

Â and get different accuracy or values such as a measurement of our Bias,

Â our Variance, or a Mean Squared Error.

Â You should definitely play with this site and get a feel for what it's actually showing.

Â The next site is talking about

Â compound probability and I won't step through this like I did the last one.

Â I'll just show you the basic ideas but you can see that we have different sets where

Â we may have an idea that this represents one particular event occurring,

Â such as, it was sunny today.

Â This might be a different event,

Â such as, it rained today.

Â And this might be a third event that it was cloudy,

Â and sometimes events may be overlapping.

Â Sometimes they may be independent or even mutually exclusive.

Â That they can't occur together.

Â And sometimes we'll be able to see the intersections of these,

Â or the unions of these,

Â and make interpretations based on that.

Â Next step was understanding combinatorics,

Â which is how we create permutations and combinations.

Â You will be able to see these and test these two ideas

Â out and you can change the number of marbles.

Â In this case that you're going to get,

Â and basically see how running this will change the results.

Â Notice, as I do this you'll see this forming,

Â so the first option is we have three different ways of placing the marbles.

Â When we add our second marble in,

Â we have these different combinations etc. etc.

Â The last idea is conditional probability.

Â This is a really neat idea because what we do is we drop balls uniformly over and

Â we see that as we move A around and we move B around and we move C around,

Â we can get different results.

Â So the idea here is that we have data that has

Â a specific probability of A occurring and

Â then a specific probability of B occurring and a specific probability of C occurring.

Â And what we can do is calculate the conditional probability that we had,

Â given B occurred, what's the probability that C occurred?

Â And this is going to be by the colors of the balls down here,

Â if they're blue, this light blue,

Â then they went through both green and blue.

Â So in other words they went through B and C,

Â and we can see that probability.

Â And you could change the perspective of things to see what happens.

Â And this then represents the conditional probability here.

Â So, play with these websites and get a better feel for these fundamental concepts.

Â The last thing I want to show you is the introduction to probability notebook,

Â which walks you through many of these ideas.

Â We look at Combinatorial Analytics,

Â where we do permutation and we actually can simulate things,

Â such as permutations of our different data sets.

Â You'll be able to see how these work.

Â We actually have tools built into Python that makes some of these things easier,

Â such as the calculation of permutations,

Â we can do permutations without replacement,

Â which is slightly different.

Â That's demonstrated here.

Â We then can also do combinations,

Â and combinations without replacement.

Â I also want to show a little bit more about probability, depending on what we are doing,

Â so for flipping a coin,

Â the probability of flipping a heads may be 0.5.

Â You can change this value and get

Â a different probability and that will change the result.

Â What we do then is we basically randomly choose either a heads or a tails

Â based on the probability we put in and we generate N of those.

Â So this effectively simulates flipping a coin N times,

Â where the probability is what we think it's going to be for a fair coin.

Â So here you go, we had heads,

Â heads, tails, tails et cetera,

Â when we could accumulate the number of heads in this particular random sample it was 11.

Â Which means the probability of getting a heads was only 0.44,

Â close but not exactly the same as what we expect theoretically.

Â Now, if we simulated more,

Â that number would likely approach 0.5, so what we expect.

Â We also do the same example with rolling a dice,

Â and we get a similar result.

Â Next we talk a little bit about Bernoulli trials and

Â the Binomial distribution and then we see

Â how this actually plays out when we flip a coin,

Â five coins, and how many heads we can get.

Â We also can look at the long term frequency,

Â and this is similar to what we saw before,

Â here is a bunch of heads,

Â what's the probability of getting a heads?

Â It was 0.4 In this particular example,

Â even though the probability was set to be 0.5.

Â If I go down farther we can make a plot of this and see

Â the long term frequency just like you saw in that previous website.

Â And you see that over time,

Â as we get increasingly large numbers of samples in this case 50,000 flips,

Â we started getting very close.

Â Notice that this is 0.5025.

Â So we're very close to the theoretical expectation, but not exact.

Â The rest of this notebook talks about other concepts that are

Â important in probability theory,

Â how we can take data and we can normalize it and get out different sets.

Â Again, here we're seeing a hundred rolls,

Â we are very close to that theoretical expectation

Â but as we increase the number of rolls, we get closer.

Â We can estimate probability from density by using histograms,

Â we just normalize the histogram,

Â we can create a cumulative distribution or a cumulative mass function.

Â This is nice, because if we can read off here,

Â we say what's the total bill that we expect 50% of the time?

Â And you could just read right off, and say,

Â well that's around $18.

Â That's what the CDF does.

Â So, with that I'm going to go ahead and stop.

Â There's a few other things in here,

Â but be sure to play with this notebook,

Â test these different concepts out yourself and get a very good feeling for probability.

Â Since we will be using it repeatedly throughout this course.

Â And in your career as a data analyst.

Â