DOI

This paper deals with a method for topic labelling that makes use of Explicit Semantic Analysis (ESA). Top words of a topic are given to ESA as an input, and the algorithm yields titles of Wikipedia articles that are considered most relevant to the input. An alternative approach that serves as a strong baseline employs titles of first outputs in a search engine, given topic words as a query. In both methods, obtained titles are then automatically analysed and phrases characterizing the topic are constructed from them with the use of a graph algorithm and are assigned with weights. Within the proposed method based on ESA, post-processing is then performed to sort candidate labels according to empirically formulated rules. Experiments were conducted on a corpus of Russian encyclopaedic texts on linguistics. The results justify applying ESA for this task, and we state that though it works a little inferior to the method based on a search engine in terms of labels’ quality, it can be used as a reasonable alternative because it exhibits two advantages that the baseline method lacks.

Язык оригиналаанглийский
Название основной публикацииArtificial Intelligence and Natural Language - 7th International Conference, AINL 2018, Proceedings
РедакторыLidia Pivovarova, Andrey Filchenkov, Jan Zizka, Dmitry Ustalov
ИздательSpringer Nature
Страницы110-116
Число страниц7
ISBN (печатное издание)9783030012038
DOI
СостояниеОпубликовано - 2018
Событие7th International Conference Artificial Intelligence and Natural Language, AINL 2018 - St. Petersburg, Российская Федерация
Продолжительность: 17 окт 201819 окт 2018

Серия публикаций

НазваниеCommunications in Computer and Information Science
Том930
ISSN (печатное издание)1865-0929

конференция

конференция7th International Conference Artificial Intelligence and Natural Language, AINL 2018
Страна/TерриторияРоссийская Федерация
ГородSt. Petersburg
Период17/10/1819/10/18

    Предметные области Scopus

  • Компьютерные науки (все)
  • Математика (все)

ID: 37684204