DOI

For the purposes of searching for various communities on the Internet, automatic typology of text messages defined via application of methods of cluster analysis may be used. In this paper, we address one of the significant issues in text classification via cluster analysis, namely determination of the number of clusters. For clustering based on semantics, text documents are typically represented in the form of vectors within n-dimensional linear space. What we suggest as a method for determining the number of clusters is the agglomerative clustering of vectors in the linear space. In our work, statistical analysis is combined with neural network algorithms to obtain a more accurate semantic portrait of a text. Then, using the techniques of distributive semantics, mapping of the derived network structures into a vector form is constructed. A statistical criterion for the completion of the clustering process is derived, defined as a Markovian moment. By obtaining automatic partitioning into clusters, one can compare texts that are closest to the centroids with actual content samples or evaluate such texts with the help of experts. If the display of texts in a vector form is adequate, all informational messages from a fixed cluster have the same meaning and the same emotional coloring. In addition, we discuss a possibility to use vector representation of texts for sentiment detection in short texts like search engines input or tweets.

Язык оригиналаанглийский
Название основной публикацииInternet Science. 6th International Conference, INSCI 2019
Подзаголовок основной публикацииProceedings
РедакторыSamira El Yacoubi, Franco Bagnoli, Giovanna Pacini
Место публикацииCham
ИздательSpringer Nature
Страницы235-249
Число страниц15
ISBN (электронное издание)9780030347703
ISBN (печатное издание)9783030347697
DOI
СостояниеОпубликовано - 1 дек 2019
Событие6th International Conference on Internet Science (INSCI) 2019 - Perpignan, Франция
Продолжительность: 2 дек 20195 дек 2019

Серия публикаций

НазваниеLecture Notes in Computer Science
ИздательSpringer
Том11938
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

конференция

конференция6th International Conference on Internet Science (INSCI) 2019
Сокращенное названиеINSCI'2019
Страна/TерриторияФранция
ГородPerpignan
Период2/12/195/12/19

    Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)

ID: 49785323