DOI

Clustering hypertext document collection is an important task in Information Retrieval. Most clustering methods are based on document content and do not take into account the hyper-text links. Here we propose a novel PageRank based clustering (PRC) algorithm which uses the hypertext structure. The PRC algorithm produces graph partitioning with high modularity and coverage. The comparison of the PRC algorithm with two content based clustering algorithms shows that there is a good match between PRC clustering and content based clustering.

Язык оригиналаанглийский
Название основной публикацииACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings
Страницы873-874
Число страниц2
DOI
СостояниеОпубликовано - 15 дек 2008
Событие31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008 - Singapore, Сингапур
Продолжительность: 20 июл 200824 июл 2008

конференция

конференция31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008
Страна/TерриторияСингапур
ГородSingapore
Период20/07/0824/07/08

    Предметные области Scopus

  • Информационные системы
  • Программный продукт

ID: 36368498