Now, so to summarize, syntagmatic relation can generally

be discovered by measuring correlations between occurrences of two words.

We've introduced the three concepts from information theory.

Entropy, which measures the uncertainty of a random variable X.

Conditional entropy, which measures the entropy of X given we know Y.

And mutual information of X and Y, which matches the entropy reduction of X

due to knowing Y, or entropy reduction of Y due to knowing X.

They are the same.

So these three concepts are actually very useful for other applications as well.

That's why we spent some time to explain this in detail.

But in particular, they are also very useful for

discovering syntagmatic relations.

In particular, mutual information is a principal way for

discovering such a relation.

It allows us to have values computed on different pairs of

words that are comparable and so we can rank these pairs and

discover the strongest syntagmatic from a collection of documents.

Now, note that there is some relation between syntagmatic relation discovery and

[INAUDIBLE] relation discovery.

So we already discussed the possibility of using BM25 to achieve waiting for

terms in the context to potentially also suggest the candidates

that have syntagmatic relations with the candidate word.

But here, once we use mutual information to discover syntagmatic relations,

we can also represent the context with this mutual information as weights.

So this would give us another way to represent

the context of a word, like a cat.

And if we do the same for all the words, then we can cluster these words or

compare the similarity between these words based on their context similarity.

So this provides yet another way to do term weighting for

paradigmatic relation discovery.

And so to summarize this whole part about word association mining.

We introduce two basic associations, called a paradigmatic and

a syntagmatic relations.

These are fairly general, they apply to any items in any language, so

the units don't have to be words, they can be phrases or entities.