Agglomerative method for texts clustering

Research outputpeer-review

Abstract

Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.

Original languageEnglish
Title of host publicationInternet Science - INSCI 2018 International Workshops, Revised Selected Papers
EditorsPolina Kolozaridi, Leonid Yuldashev, Heiko Niedermayer, Svetlana S. Bodrunova, Anna Smoliarova, Harry Halpin, Olessia Koltsova, Asbjørn Følstad
PublisherSpringer
Pages19-32
Number of pages14
ISBN (Print)9783030177041
DOIs
Publication statusPublished - 1 Jan 2019
Event5th International Conference on Internet Science, INSCI 2018 - St. Petersburg
Duration: 24 Oct 201826 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11551 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Conference on Internet Science, INSCI 2018
CountryRussian Federation
CitySt. Petersburg
Period24/10/1826/10/18

Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Orekhov, A. V. (2019). Agglomerative method for texts clustering. In P. Kolozaridi, L. Yuldashev, H. Niedermayer, S. S. Bodrunova, A. Smoliarova, H. Halpin, O. Koltsova, ... A. Følstad (Eds.), Internet Science - INSCI 2018 International Workshops, Revised Selected Papers (pp. 19-32). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11551 LNCS). Springer. https://doi.org/10.1007/978-3-030-17705-8_2
Orekhov, Andrey V. / Agglomerative method for texts clustering. Internet Science - INSCI 2018 International Workshops, Revised Selected Papers. editor / Polina Kolozaridi ; Leonid Yuldashev ; Heiko Niedermayer ; Svetlana S. Bodrunova ; Anna Smoliarova ; Harry Halpin ; Olessia Koltsova ; Asbjørn Følstad. Springer, 2019. pp. 19-32 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{7ac9ab17e6ad4c719b5b03c18d4af5b3,
title = "Agglomerative method for texts clustering",
abstract = "Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.",
keywords = "Cluster analysis, Clustering method, Euclidean space, Harmful content, Least squares method, Markov moment, Cluster analysis, Clustering method, Least squares method, Euclidean space, Markov moment, Harmful content .",
author = "Orekhov, {Andrey V.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-17705-8_2",
language = "English",
isbn = "9783030177041",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "19--32",
editor = "Polina Kolozaridi and Leonid Yuldashev and Heiko Niedermayer and Bodrunova, {Svetlana S.} and Anna Smoliarova and Harry Halpin and Olessia Koltsova and Asbj{\o}rn F{\o}lstad",
booktitle = "Internet Science - INSCI 2018 International Workshops, Revised Selected Papers",
address = "Germany",

}

Orekhov, AV 2019, Agglomerative method for texts clustering. in P Kolozaridi, L Yuldashev, H Niedermayer, SS Bodrunova, A Smoliarova, H Halpin, O Koltsova & A Følstad (eds), Internet Science - INSCI 2018 International Workshops, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11551 LNCS, Springer, pp. 19-32, St. Petersburg, 24/10/18. https://doi.org/10.1007/978-3-030-17705-8_2

Agglomerative method for texts clustering. / Orekhov, Andrey V.

Internet Science - INSCI 2018 International Workshops, Revised Selected Papers. ed. / Polina Kolozaridi; Leonid Yuldashev; Heiko Niedermayer; Svetlana S. Bodrunova; Anna Smoliarova; Harry Halpin; Olessia Koltsova; Asbjørn Følstad. Springer, 2019. p. 19-32 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11551 LNCS).

Research outputpeer-review

TY - GEN

T1 - Agglomerative method for texts clustering

AU - Orekhov, Andrey V.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.

AB - Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.

KW - Cluster analysis

KW - Clustering method

KW - Euclidean space

KW - Harmful content

KW - Least squares method

KW - Markov moment

KW - Cluster analysis, Clustering method, Least squares method, Euclidean space, Markov moment, Harmful content .

UR - http://www.scopus.com/inward/record.url?scp=85065304968&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-17705-8_2

DO - 10.1007/978-3-030-17705-8_2

M3 - Conference contribution

SN - 9783030177041

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 19

EP - 32

BT - Internet Science - INSCI 2018 International Workshops, Revised Selected Papers

A2 - Kolozaridi, Polina

A2 - Yuldashev, Leonid

A2 - Niedermayer, Heiko

A2 - Bodrunova, Svetlana S.

A2 - Smoliarova, Anna

A2 - Halpin, Harry

A2 - Koltsova, Olessia

A2 - Følstad, Asbjørn

PB - Springer

ER -

Orekhov AV. Agglomerative method for texts clustering. In Kolozaridi P, Yuldashev L, Niedermayer H, Bodrunova SS, Smoliarova A, Halpin H, Koltsova O, Følstad A, editors, Internet Science - INSCI 2018 International Workshops, Revised Selected Papers. Springer. 2019. p. 19-32. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-17705-8_2