Agglomerative Method for Texts Clustering

Research outputpeer-review

Abstract

Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.

Original languageEnglish
Title of host publicationInternet Science - INSCI 2018 International Workshops, Revised Selected Papers
EditorsSS Bodrunova, O Koltsova, A Folstad, H Halpin, P Kolozaridi, L Yuldashev, A Smoliarova, H Niedermayer
PublisherSpringer
Pages19-32
Number of pages14
ISBN (Print)9783030177041
DOIs
Publication statusPublished - 2019
Event5th International Conference on Internet Science : Internet in World Regions: Digital Freedoms and Citizen Empowerment - СПбГУ, Институт "Высшая школа журналистики и массовых коммуникаций", St. Petersburg
Duration: 24 Oct 201826 Oct 2018
Conference number: 5th
http://insci2018.org/
http://insci2018.org

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11551 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Conference on Internet Science (INSCI)
Abbreviated title INSCI 2018
CountryRussian Federation
CitySt. Petersburg
Period24/10/1826/10/18
Internet address

Fingerprint

Text Clustering
Cluster analysis
Clustering algorithms
Euclidean space
Text Analysis
Number of Clusters
Cluster Analysis
Clustering Algorithm
n-dimensional
Retrieval
Clustering
Moment
Text

Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Orekhov, A. V. (2019). Agglomerative Method for Texts Clustering. In SS. Bodrunova, O. Koltsova, A. Folstad, H. Halpin, P. Kolozaridi, L. Yuldashev, A. Smoliarova, ... H. Niedermayer (Eds.), Internet Science - INSCI 2018 International Workshops, Revised Selected Papers (pp. 19-32). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11551 LNCS). Springer. https://doi.org/10.1007/978-3-030-17705-8_2
Orekhov, Andrey V. / Agglomerative Method for Texts Clustering. Internet Science - INSCI 2018 International Workshops, Revised Selected Papers. editor / SS Bodrunova ; O Koltsova ; A Folstad ; H Halpin ; P Kolozaridi ; L Yuldashev ; A Smoliarova ; H Niedermayer. Springer, 2019. pp. 19-32 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{7ac9ab17e6ad4c719b5b03c18d4af5b3,
title = "Agglomerative Method for Texts Clustering",
abstract = "Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.",
keywords = "Cluster analysis, Clustering method, Euclidean space, Harmful content, Least squares method, Markov moment, Cluster analysis, Clustering method, Least squares method, Euclidean space, Markov moment, Harmful content .",
author = "Orekhov, {Andrey V.}",
year = "2019",
doi = "10.1007/978-3-030-17705-8_2",
language = "Английский",
isbn = "9783030177041",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "19--32",
editor = "SS Bodrunova and O Koltsova and A Folstad and H Halpin and P Kolozaridi and L Yuldashev and A Smoliarova and H Niedermayer",
booktitle = "Internet Science - INSCI 2018 International Workshops, Revised Selected Papers",
address = "Германия",

}

Orekhov, AV 2019, Agglomerative Method for Texts Clustering. in SS Bodrunova, O Koltsova, A Folstad, H Halpin, P Kolozaridi, L Yuldashev, A Smoliarova & H Niedermayer (eds), Internet Science - INSCI 2018 International Workshops, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11551 LNCS, Springer, pp. 19-32, St. Petersburg, 24/10/18. https://doi.org/10.1007/978-3-030-17705-8_2

Agglomerative Method for Texts Clustering. / Orekhov, Andrey V.

Internet Science - INSCI 2018 International Workshops, Revised Selected Papers. ed. / SS Bodrunova; O Koltsova; A Folstad; H Halpin; P Kolozaridi; L Yuldashev; A Smoliarova; H Niedermayer. Springer, 2019. p. 19-32 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11551 LNCS).

Research outputpeer-review

TY - GEN

T1 - Agglomerative Method for Texts Clustering

AU - Orekhov, Andrey V.

PY - 2019

Y1 - 2019

N2 - Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.

AB - Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.

KW - Cluster analysis

KW - Clustering method

KW - Euclidean space

KW - Harmful content

KW - Least squares method

KW - Markov moment

KW - Cluster analysis, Clustering method, Least squares method, Euclidean space, Markov moment, Harmful content .

UR - http://www.scopus.com/inward/record.url?scp=85065304968&partnerID=8YFLogxK

UR - http://www.mendeley.com/research/agglomerative-method-texts-clustering

U2 - 10.1007/978-3-030-17705-8_2

DO - 10.1007/978-3-030-17705-8_2

M3 - статья в сборнике материалов конференции

AN - SCOPUS:85065304968

SN - 9783030177041

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 19

EP - 32

BT - Internet Science - INSCI 2018 International Workshops, Revised Selected Papers

A2 - Bodrunova, SS

A2 - Koltsova, O

A2 - Folstad, A

A2 - Halpin, H

A2 - Kolozaridi, P

A2 - Yuldashev, L

A2 - Smoliarova, A

A2 - Niedermayer, H

PB - Springer

ER -

Orekhov AV. Agglomerative Method for Texts Clustering. In Bodrunova SS, Koltsova O, Folstad A, Halpin H, Kolozaridi P, Yuldashev L, Smoliarova A, Niedermayer H, editors, Internet Science - INSCI 2018 International Workshops, Revised Selected Papers. Springer. 2019. p. 19-32. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-17705-8_2