Сравнительный анализ ассоциаций в корпусах социальных сетей на основе дистрибутивно-семантических моделей для русского языка

Анна А. Антипенко, Ольга А. Митрофанова

Research output

2 Downloads (Pure)


The paper discusses results of the experiment on
automatic extraction of associative relations from corpora of
Russian texts from Facebook and Pikabu social networks by
means of distributional semantic models. The choice of
linguistic data for analysis, namely, social networks texts, is
determined by the specificity of polylogic internet-discourse
which combines traits of written and colloquial speech. We
put forward the hypothesis on the possibility of reproduction
of associative test technique in the experiments with
distributional semantic models. Experiments were carried out
with the help of algorithms and tools of Distributional
Semantics. We extracted associations for lexemes expressing
key concepts of Russian-specific world view. The procedure
was performed by means of Word2Vec (CBOW and Skipgram) neural network architectures. We carried out linguistic
analysis of the output data and compared it with the
associations described in the Russian Associative Dictionary,
Russian regional association database (Siberia and Fare East)
and the Russian Distributional Thesaurus. Results achieved in
course of experiments allow to make conclusions on the
dynamic of Russian-specific language consciousness of
contemporary social network users. We worked out and
implemented the procedure of quantitative evaluation of data
extracted from different sources. We found evidence on the
specialization of lexicographic resources and distributional
semantic models as regards paradigmatic and syntagmatic
relations. Experimental data allowed to carry out linguistic
analysis of contemporary Russian-specific world view of social
networks users and to reveal tendencies in its development.
Original languageRussian
Pages (from-to)27-33
JournalInternational Journal of Open Information Technologies
Issue number1
Publication statusPublished - 2020


Cite this