Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
This paper deals with a method for topic labelling that makes use of Explicit Semantic Analysis (ESA). Top words of a topic are given to ESA as an input, and the algorithm yields titles of Wikipedia articles that are considered most relevant to the input. An alternative approach that serves as a strong baseline employs titles of first outputs in a search engine, given topic words as a query. In both methods, obtained titles are then automatically analysed and phrases characterizing the topic are constructed from them with the use of a graph algorithm and are assigned with weights. Within the proposed method based on ESA, post-processing is then performed to sort candidate labels according to empirically formulated rules. Experiments were conducted on a corpus of Russian encyclopaedic texts on linguistics. The results justify applying ESA for this task, and we state that though it works a little inferior to the method based on a search engine in terms of labels’ quality, it can be used as a reasonable alternative because it exhibits two advantages that the baseline method lacks.
Original language | English |
---|---|
Title of host publication | Artificial Intelligence and Natural Language - 7th International Conference, AINL 2018, Proceedings |
Editors | Lidia Pivovarova, Andrey Filchenkov, Jan Zizka, Dmitry Ustalov |
Publisher | Springer Nature |
Pages | 110-116 |
Number of pages | 7 |
ISBN (Print) | 9783030012038 |
DOIs | |
State | Published - 2018 |
Event | 7th International Conference Artificial Intelligence and Natural Language, AINL 2018 - St. Petersburg, Russian Federation Duration: 17 Oct 2018 → 19 Oct 2018 |
Name | Communications in Computer and Information Science |
---|---|
Volume | 930 |
ISSN (Print) | 1865-0929 |
Conference | 7th International Conference Artificial Intelligence and Natural Language, AINL 2018 |
---|---|
Country/Territory | Russian Federation |
City | St. Petersburg |
Period | 17/10/18 → 19/10/18 |
ID: 37684204