Make your own free website on Tripod.com

Text clustering is most commonly treated as a fully automated task without user

feedback. However, a variety of researchers have explored mixed-initiative clustering

methods which allow a user to interact with and advise the clustering algorithm. This

mixed-initiative approach is especially attractive for text clustering tasks where the user

is trying to organize a corpus of documents into clusters for some particular purpose

(e.g., clustering their email into folders that reflect various activities in which they are

involved). This paper introduces a new approach to mixed-initiative clustering that

handles several natural types of user feedback. We first introduce a new probabilistic

generative model for text clustering (the SpeClustering model) and show that it

outperforms the commonly used mixture of multinomials clustering model, even when

used in fully autonomous mode with no user input. We then describe how to incorporate

four distinct types of user feedback into the clustering algorithm, and provide

experimental evidence showing substantial improvements in text clustering when this

user feedback is incorporate