Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Network Presentation of Texts and Clustering of Messages. / Orekhov, Andrey V. ; Kharlamov, Alexander A. ; Bodrunova, Svetlana S. .
Internet Science. 6th International Conference, INSCI 2019 : Proceedings. ред. / Samira El Yacoubi; Franco Bagnoli; Giovanna Pacini. Cham : Springer Nature, 2019. стр. 235-249 18 (Lecture Notes in Computer Science; Том 11938).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Network Presentation of Texts and Clustering of Messages
AU - Orekhov, Andrey V.
AU - Kharlamov, Alexander A.
AU - Bodrunova, Svetlana S.
N1 - Orekhov A.V., Kharlamov A.A., Bodrunova S.S. (2019) Network Presentation of Texts and Clustering of Messages. In: El Yacoubi S., Bagnoli F., Pacini G. (eds) Internet Science. INSCI 2019. Lecture Notes in Computer Science, vol 11938. Springer, Cham
PY - 2019/12/1
Y1 - 2019/12/1
N2 - For the purposes of searching for various communities on the Internet, automatic typology of text messages defined via application of methods of cluster analysis may be used. In this paper, we address one of the significant issues in text classification via cluster analysis, namely determination of the number of clusters. For clustering based on semantics, text documents are typically represented in the form of vectors within n-dimensional linear space. What we suggest as a method for determining the number of clusters is the agglomerative clustering of vectors in the linear space. In our work, statistical analysis is combined with neural network algorithms to obtain a more accurate semantic portrait of a text. Then, using the techniques of distributive semantics, mapping of the derived network structures into a vector form is constructed. A statistical criterion for the completion of the clustering process is derived, defined as a Markovian moment. By obtaining automatic partitioning into clusters, one can compare texts that are closest to the centroids with actual content samples or evaluate such texts with the help of experts. If the display of texts in a vector form is adequate, all informational messages from a fixed cluster have the same meaning and the same emotional coloring. In addition, we discuss a possibility to use vector representation of texts for sentiment detection in short texts like search engines input or tweets.
AB - For the purposes of searching for various communities on the Internet, automatic typology of text messages defined via application of methods of cluster analysis may be used. In this paper, we address one of the significant issues in text classification via cluster analysis, namely determination of the number of clusters. For clustering based on semantics, text documents are typically represented in the form of vectors within n-dimensional linear space. What we suggest as a method for determining the number of clusters is the agglomerative clustering of vectors in the linear space. In our work, statistical analysis is combined with neural network algorithms to obtain a more accurate semantic portrait of a text. Then, using the techniques of distributive semantics, mapping of the derived network structures into a vector form is constructed. A statistical criterion for the completion of the clustering process is derived, defined as a Markovian moment. By obtaining automatic partitioning into clusters, one can compare texts that are closest to the centroids with actual content samples or evaluate such texts with the help of experts. If the display of texts in a vector form is adequate, all informational messages from a fixed cluster have the same meaning and the same emotional coloring. In addition, we discuss a possibility to use vector representation of texts for sentiment detection in short texts like search engines input or tweets.
KW - Cluster analysis
KW - Distributive semantics
KW - Least squares method
KW - Markov moment
KW - Neural network algorithms
KW - Semantic network
KW - Social network analysis
UR - http://www.scopus.com/inward/record.url?scp=85076538996&partnerID=8YFLogxK
UR - http://www.mendeley.com/research/network-presentation-texts-clustering-messages
U2 - 10.1007/978-3-030-34770-3_18
DO - 10.1007/978-3-030-34770-3_18
M3 - Conference contribution
AN - SCOPUS:85076538996
SN - 9783030347697
T3 - Lecture Notes in Computer Science
SP - 235
EP - 249
BT - Internet Science. 6th International Conference, INSCI 2019
A2 - El Yacoubi, Samira
A2 - Bagnoli, Franco
A2 - Pacini, Giovanna
PB - Springer Nature
CY - Cham
T2 - 6th International Conference on Internet Science, INSCI 2019
Y2 - 2 December 2019 through 5 December 2019
ER -
ID: 49785323