In this paper we present a novel algorithm for document clustering. This approach is based on distributional clustering where subject related words, which have a narrow context, are identified to form metatags for that subject. These contextual words form the basis for creating thematic clusters of documents. In a similar fashion to other research papers on document clustering, we analyze the quality of this approach with respect to document categorization problems and show it to outperform the information theoretic method of sequential information bottleneck.

Язык оригиналаанглийский
Страницы (с-по)167-180
Число страниц14
ЖурналLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том2997
СостояниеОпубликовано - 1 дек 2004

    Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)

ID: 36369783