Text clustering is most commonly treated as a fully automated task
without user
feedback. However, a variety of researchers
have explored mixed-initiative clustering
methods which allow a user to interact with and
advise the clustering algorithm. This
mixed-initiative approach is especially
attractive for text clustering tasks where the user
is trying to organize a corpus of documents
into clusters for some particular purpose
(e.g., clustering
their email into folders that reflect various activities in which they are
involved). This paper introduces a new approach
to mixed-initiative clustering that
handles several natural types of user feedback.
We first introduce a new probabilistic
generative model for text clustering (the
SpeClustering model) and show that it
outperforms the commonly used mixture of
multinomials clustering model, even when
used in fully autonomous mode with no user
input. We then describe how to incorporate
four distinct types of user feedback into the
clustering algorithm, and provide
experimental evidence showing substantial
improvements in text clustering when this
user feedback is incorporate