Автоматическое назначение меток тем в тематических моделях русскоязычных корпусов тестов

Research output: Chapter in Book/Report/Conference proceeding › Article in an anthology › Research › peer-review

Department of Mathematical Linguistics

Алия Ришатовна Ерофеева
Ольга Александровна Митрофанова

Topic modelling is a widely used statistical technique which allows to reveal internal conceptual organization of text corpora. The main goal of this paper was to improve topic modelling algorithms by introducing automatic topic labelling, a procedure which chooses a label for a cluster of words in a topic. We have chosen an unsupervised graph-based method and elaborated it with regard to Russian. The proposed algorithm consists of two stages: candidate generation by means of PageRank and morphological filters, and candidate ranking. Our topic labelling experiments on a corpus of encyclopaedic texts on linguistics has shown the advantages of labelled topic models for NLP applications.

Translated title of the contribution	AUTOMATIC ASSIGNMENT OF TOPIC LABELS IN TOPIC MODELS FOR RUSSIAN TEXT CORPORA
Original language	Russian
Title of host publication	Структурная и прикладная лингвистика
Subtitle of host publication	Межвузовский сборник. Выпуск 12. К 60-летию отделения прикладной, компьютерной и математической лингвистики СПбГУ
Editors	И.С. Николаев
Place of Publication	СПб.
Publisher	Издательство Санкт-Петербургского университета
Pages	122-147
Number of pages	26
Volume	12
State	Published - 2019

Research areas

Topic modelling, TOPIC LABELLING, Russian corpora

ID: 62338036