The paper is devoted to the improvement of topic modelling algorithms aimed at
extraction of latent relations between words, documents and topics in processed
corpora. In the majority of cases topics generated by topic models contain only
unigrams, so that the interpretation of extracted topics turns out to be a complicated
task. This paper presents a new algorithm based on the classic LDA model which
provides automatic extraction of bigrams in the given text collection and further
incorporation of bigrams into the topic model. In the given paper we describe out
algorithm in action and discuss results achieved in course of processing the Russian
corpora on radioengineering and on linguistics.
Translated title of the contributionTopic Modelling of Russian Texts based on Lemmata and Lexical Constructions
Original languageRussian
Title of host publicationКомпьютерная лингвистика и вычислительные онтологии. Выпуск 1
Subtitle of host publicationТруды XX Международной объединенной научной конференции «Интернет и современное общество», IMS-2017, Санкт-Петербург, 21 – 23 июня 2017 г. Сборник научных статей
PublisherНИУ ИТМО
Pages132-144
StateAccepted/In press - 2017
Event2017 International Conference on Internet and Modern Society, IMS 2017: международная объединенная конференция - Университет ИТМО, Санкт-Петербург, Russian Federation
Duration: 21 Jun 201723 Jun 2017
Conference number: XX
http://icims.ifmo.ru/
http://ims.ifmo.ru/ru/pages/28/IMS_2017.htm

Conference

Conference2017 International Conference on Internet and Modern Society, IMS 2017
Abbreviated titleIMS 2017
Country/TerritoryRussian Federation
CityСанкт-Петербург
Period21/06/1723/06/17
Internet address

ID: 9328963