So, without doing categorization,
it will be much harder to aggregate such opinions to provide a concise
way of coding text in some sense based on all of the vocabulary.
And, sometimes you may see in some applications, text with categorizations
called a text coded, encoded with some control of vocabulary.
The second kind of reasons is to use text
categorization to infer properties of entities,
and text categories allows us to infer the properties
of such entities that are associate with text data.
So, this means we can use text categorization
to discover knowledge about the world.
In general, as long as we can associate the entity with text of data,
we can always the text of data to help categorize the corresponding entities.
So, it's used for
single information network that will connect the other entities with text data.
The obvious entities that can be directly connected are authors.
But, you can also imagine the author's affiliations or the author's age and
other things can be actually connected to text data indirectly.
Once we have made the connection, then we can make a prediction about those values.
So, this is a general way to allow us to use text mining through, so
the text categorization to discover knowledge about the world.
Very useful, especially in big text data analytics where we are often
just using text data as extra sets of data extracted from humans
to infer certain decision factors often together with non-textual data.
Specifically with text, for example,
we can also think of examples of inferring properties of entities.
For example, discovery of non-native speakers of a language.
And, this can be done by categorizing the content of speakers.