Standard

Clustering Narrow-Domain Short Texts Using K-Means, Linguistic Patterns and LSI. / Popova, Svetlana; Danilova, Vera; Egorov, Artem.

In: Communications in Computer and Information Science, Vol. 436, 2014, p. 66-77.

Research output: Contribution to journalArticlepeer-review

Harvard

Popova, S, Danilova, V & Egorov, A 2014, 'Clustering Narrow-Domain Short Texts Using K-Means, Linguistic Patterns and LSI', Communications in Computer and Information Science, vol. 436, pp. 66-77. <http://www.scopus.com>

APA

Popova, S., Danilova, V., & Egorov, A. (2014). Clustering Narrow-Domain Short Texts Using K-Means, Linguistic Patterns and LSI. Communications in Computer and Information Science, 436, 66-77. http://www.scopus.com

Vancouver

Popova S, Danilova V, Egorov A. Clustering Narrow-Domain Short Texts Using K-Means, Linguistic Patterns and LSI. Communications in Computer and Information Science. 2014;436:66-77.

Author

Popova, Svetlana ; Danilova, Vera ; Egorov, Artem. / Clustering Narrow-Domain Short Texts Using K-Means, Linguistic Patterns and LSI. In: Communications in Computer and Information Science. 2014 ; Vol. 436. pp. 66-77.

BibTeX

@article{e9819fc7c8ad4d80819d424a2d10b290,
title = "Clustering Narrow-Domain Short Texts Using K-Means, Linguistic Patterns and LSI",
abstract = "In the present work we consider the problem of narrow-domain clustering of short texts, such as academic abstracts. Our main objective is to check whether it is possible to improve the quality of k-means algorithm expanding the feature space by adding a dictionary of word groups that were selected from texts on the basis of a fixed set of patterns. Also, we check the possibility to increase the quality of clustering by mapping the feature spaces to a semantic space with a lower dimensionality using Latent Semantic Indexing (LSI). The results allow us to assume that the aforementioned modifications are feasible in practical terms as compared to the use of k-means in the feature space defined only by the main dictionary of the corpus.",
author = "Svetlana Popova and Vera Danilova and Artem Egorov",
year = "2014",
language = "не определен",
volume = "436",
pages = "66--77",
journal = "Communications in Computer and Information Science",
issn = "1865-0929",
publisher = "Springer Nature",

}

RIS

TY - JOUR

T1 - Clustering Narrow-Domain Short Texts Using K-Means, Linguistic Patterns and LSI

AU - Popova, Svetlana

AU - Danilova, Vera

AU - Egorov, Artem

PY - 2014

Y1 - 2014

N2 - In the present work we consider the problem of narrow-domain clustering of short texts, such as academic abstracts. Our main objective is to check whether it is possible to improve the quality of k-means algorithm expanding the feature space by adding a dictionary of word groups that were selected from texts on the basis of a fixed set of patterns. Also, we check the possibility to increase the quality of clustering by mapping the feature spaces to a semantic space with a lower dimensionality using Latent Semantic Indexing (LSI). The results allow us to assume that the aforementioned modifications are feasible in practical terms as compared to the use of k-means in the feature space defined only by the main dictionary of the corpus.

AB - In the present work we consider the problem of narrow-domain clustering of short texts, such as academic abstracts. Our main objective is to check whether it is possible to improve the quality of k-means algorithm expanding the feature space by adding a dictionary of word groups that were selected from texts on the basis of a fixed set of patterns. Also, we check the possibility to increase the quality of clustering by mapping the feature spaces to a semantic space with a lower dimensionality using Latent Semantic Indexing (LSI). The results allow us to assume that the aforementioned modifications are feasible in practical terms as compared to the use of k-means in the feature space defined only by the main dictionary of the corpus.

M3 - статья

VL - 436

SP - 66

EP - 77

JO - Communications in Computer and Information Science

JF - Communications in Computer and Information Science

SN - 1865-0929

ER -

ID: 5746655