Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › Рецензирование
We present a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues. The lexicon was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year (1.5 million posts and 20 million comments). Following a topic modeling approach, we extracted 85,898 documents that were used to retrieve domain-specific terms. This term list was then merged with several external sources. Together, they formed a lexicon (16,399 units) marked-up using a crowdsourcing strategy. A sample of Russian native speakers (n = 105) was asked to assess words’ sentiment given the context of their use (randomly paired) as well as the prevailing sentiment of the respective texts. In total, we received 59,208 complete annotations for both texts and words. Several versions of the marked-up lexicon were experimented with, and the final version was tested for quality against the only other freely available Russian language lexicon and against three machine learning algorithms. All experiments were run on two different collections. They have shown that, in terms of, lexicon-based approaches outperform machine learning by 11%, and our lexicon outperforms the alternative one by 11% on the first collection, and by 7% on the negative scale of the second collection while showing similar quality on the positive scale and being three times smaller. Our lexicon also outperforms or is similar to the best existing sentiment analysis results for other types of Russian-language texts.
Язык оригинала | английский |
---|---|
Название основной публикации | Artificial Intelligence and Natural Language - 9th Conference, AINL 2020, Proceedings |
Редакторы | Andrey Filchenkov, Janne Kauttonen, Lidia Pivovarova |
Издатель | Springer Nature |
Страницы | 1-16 |
Число страниц | 16 |
ISBN (печатное издание) | 9783030590819 |
DOI | |
Состояние | Опубликовано - 2020 |
Событие | 9th Conference on Artificial Intelligence and Natural Language, AINL 2020 - Helsinki, Финляндия Продолжительность: 7 окт 2020 → 9 окт 2020 |
Название | Communications in Computer and Information Science |
---|---|
Том | 1292 CCIS |
ISSN (печатное издание) | 1865-0929 |
ISSN (электронное издание) | 1865-0937 |
конференция | 9th Conference on Artificial Intelligence and Natural Language, AINL 2020 |
---|---|
Сокращенное название | AINL 2020 |
Страна/Tерритория | Финляндия |
Город | Helsinki |
Период | 7/10/20 → 9/10/20 |
ID: 71870169