When processing large arrays of empirical data or large-scale data, cluster analysis remains one of the primary methods of preliminary typology, which makes it necessary to obtain formal rules for calculating the number of clusters. The most common method for determining the preferred number of clusters is the visual analysis of dendrograms, but this approach is purely heuristic. The number of clusters and the end moment of the clustering algorithm depend on each other. Cluster analysis of data from n-dimensional Euclidean space using the “single linkage” method can consider as a discrete random process. Sequences of “minimum distances” define the trajectories of this process. The “approximation-estimating test” allows us to establish the Markov moment when the growth rate of such a sequence changes from linear to parabolic, which, in turn, may be a sign of the completion of the agglomerative clustering process. The calculation of the number of clusters is the critical problem in many cases of the automatic typology of empirical data. For example, in medicine with cytometric analysis of blood, automated analysis of texts and in other instances when the number of clusters not known in advance.

Translated title of the contributionMarkov moment for the agglomerative method of clustering in Euclidean space
Original languageRussian
Pages (from-to)76-92
Number of pages17
Journal ВЕСТНИК САНКТ-ПЕТЕРБУРГСКОГО УНИВЕРСИТЕТА. ПРИКЛАДНАЯ МАТЕМАТИКА. ИНФОРМАТИКА. ПРОЦЕССЫ УПРАВЛЕНИЯ
Volume15
Issue number1
DOIs
StatePublished - 1 Jan 2019

    Scopus subject areas

  • Control and Optimization
  • Applied Mathematics
  • Computer Science(all)

    Research areas

  • Cluster analysis, Least squares method, Markov moment, least squares method, NUMBER, cluster analysis

ID: 41340292