Usually, text documents are represented as a vector of n-dimensional Euclidean space. One of the main it the problem of the typology of texts using cluster analysis is to determine the number of clusters. In this article was researched the agglomerative clustering algorithm in Euclidean space. A statistical criterion for completing the clustering process was deriving as the Markov moment. Was considered the problem of cluster stability. As an example, it was considered retrieval of the harmful content.

Original languageEnglish
Title of host publicationInternet Science - INSCI 2018 International Workshops
Subtitle of host publicationConference proceedings
EditorsS.S. Bodrunova, et al.
PublisherSpringer Nature
Pages19-32
ISBN (Print)9783030177041
DOIs
StatePublished - 2019
Event5th International Conference on Internet Science, INSCI 2018: Internet in World Regions: Digital Freedoms and Citizen Empowerment - СПбГУ, Институт "Высшая школа журналистики и массовых коммуникаций", St. Petersburg, Russian Federation
Duration: 24 Oct 201826 Oct 2018
Conference number: 5th
http://insci2018.org/
http://insci2018.org

Publication series

NameLecture Notes in Computer Science
Volume11551
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Conference on Internet Science, INSCI 2018
Abbreviated title INSCI 2018
Country/TerritoryRussian Federation
CitySt. Petersburg
Period24/10/1826/10/18
Internet address

    Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

    Research areas

  • Cluster analysis, Clustering method, Euclidean space, Harmful content, Least squares method, Markov moment

ID: 41713635