Topic modelling is a widely used statistical technique which allows to reveal internal conceptual organization of text corpora. The main goal of this paper was to improve topic modelling algorithms by introducing automatic topic labelling, a procedure which chooses a label for a cluster of words in a topic. We have chosen an unsupervised graph-based method and elaborated it with regard to Russian. The proposed algorithm consists of two stages: candidate generation by means of PageRank and morphological filters, and candidate ranking. Our topic labelling experiments on a corpus of encyclopaedic texts on linguistics has shown the advantages of labelled topic models for NLP applications.

Translated title of the contributionAUTOMATIC ASSIGNMENT OF TOPIC LABELS IN TOPIC MODELS FOR RUSSIAN TEXT CORPORA
Original languageRussian
Title of host publicationСтруктурная и прикладная лингвистика
Subtitle of host publicationМежвузовский сборник. Выпуск 12. К 60-летию отделения прикладной, компьютерной и математической лингвистики СПбГУ
EditorsИ.С. Николаев
Place of PublicationСПб.
PublisherИздательство Санкт-Петербургского университета
Pages122-147
Number of pages26
Volume12
StatePublished - 2019

    Research areas

  • Topic modelling, TOPIC LABELLING, Russian corpora

ID: 62338036