The paper deals with development and application of automatic word clustering (AWC) tool aimed at processing Russian texts of various types, which should satisfy the requirements of flexibility and compatibility with other linguistic resources. The construction of AWC tool requires computer implementation of latent semantic analysis (LSA) combined with clustering algorithms. To meet the need, Python-based software has been developed. Major procedures performed by AWC tool are segmentation of input texts and context analysis, co-occurrence matrix construction, agglomerative and K-means clustering. Special attention is drawn to experimental results on clustering words in raw texts with changing parameters.
Язык оригиналаанглийский
Название основной публикацииText, Speech and Dialogue
Подзаголовок основной публикации10th International Conference, TSD 2007, Pilsen, Czech Republic, September 3-7, 2007, Proceedings
ИздательSpringer Nature
Страницы85-97
ISBN (электронное издание)9783540746287
ISBN (печатное издание)9783540746270
СостояниеОпубликовано - 2007
Событие10th International Conference - Pilsen, Чехия
Продолжительность: 3 сен 20077 сен 2007

Серия публикаций

НазваниеLecture Notes in Computer Science
Том4629

конференция

конференция10th International Conference
Сокращенное названиеTSD 2007
Страна/TерриторияЧехия
ГородPilsen
Период3/09/077/09/07

ID: 4509961