Now, you may notice that we often write down probable
estimate in the form of being proportional for certain numbers.
And this is often sufficient,
because we have some constraints on these distributions.
So the normalizer is dictated by the constraint.
So in this case, it will be useful for you to think about
what are the constraints on these two kinds of probabilities?
So once you figure out the answer to this question, and
you will know how to normalize these accounts.
And so this is a good exercise to work on if it's not obvious to you.
There is another issue in Naive Bayes which is a smoothing.
In fact the smoothing is a general problem in older estimate of language morals.
And this has to do with,
what would happen if you have observed a small amount of data?
So smoothing is an important technique to address that outsmarts this.
In our case, the training data can be small and when the data set is small when
we use maximum likely estimator we often face the problem of zero probability.
That means if an event is not observed
then the estimated probability would be zero.
In this case, if we have not seen a word in the training documents for
let's say, category i.
Then our estimator would be zero for the probability of this one in this category,
and this is generally not accurate.
So we have to do smoothing to make sure it's not zero probability.
The other reason for smoothing is that this is a way to bring prior knowledge,
and this is also generally true for a lot of situations of smoothing.
When the data set is small,
we tend to rely on some prior knowledge to solve the problem.
So in this case our [INAUDIBLE] says that no word should have zero probability.
So smoothing allows us to inject these to prior initial that no
order has a real zero probability.