PolSentiLex › SPbU Researchers Portal

Standard

PolSentiLex : Sentiment Detection in Socio-Political Discussions on Russian Social Media. / Koltsova, Olessia; Alexeeva, Svetlana; Pashakhin, Sergei; Koltsov, Sergei.

Artificial Intelligence and Natural Language - 9th Conference, AINL 2020, Proceedings. ed. / Andrey Filchenkov; Janne Kauttonen; Lidia Pivovarova. Springer Nature, 2020. p. 1-16 (Communications in Computer and Information Science; Vol. 1292 CCIS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Harvard

Koltsova, O, Alexeeva, S, Pashakhin, S & Koltsov, S 2020, PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media. in A Filchenkov, J Kauttonen & L Pivovarova (eds), Artificial Intelligence and Natural Language - 9th Conference, AINL 2020, Proceedings. Communications in Computer and Information Science, vol. 1292 CCIS, Springer Nature, pp. 1-16, 9th Conference on Artificial Intelligence and Natural Language, AINL 2020, Helsinki, Finland, 7/10/20. https://doi.org/10.1007/978-3-030-59082-6_1

APA

Koltsova, O., Alexeeva, S., Pashakhin, S., & Koltsov, S. (2020). PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media. In A. Filchenkov, J. Kauttonen, & L. Pivovarova (Eds.), Artificial Intelligence and Natural Language - 9th Conference, AINL 2020, Proceedings (pp. 1-16). (Communications in Computer and Information Science; Vol. 1292 CCIS). Springer Nature. https://doi.org/10.1007/978-3-030-59082-6_1

Vancouver

Koltsova O, Alexeeva S, Pashakhin S, Koltsov S. PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media. In Filchenkov A, Kauttonen J, Pivovarova L, editors, Artificial Intelligence and Natural Language - 9th Conference, AINL 2020, Proceedings. Springer Nature. 2020. p. 1-16. (Communications in Computer and Information Science). https://doi.org/10.1007/978-3-030-59082-6_1

Author

Koltsova, Olessia ; Alexeeva, Svetlana ; Pashakhin, Sergei ; Koltsov, Sergei. / PolSentiLex : Sentiment Detection in Socio-Political Discussions on Russian Social Media. Artificial Intelligence and Natural Language - 9th Conference, AINL 2020, Proceedings. editor / Andrey Filchenkov ; Janne Kauttonen ; Lidia Pivovarova. Springer Nature, 2020. pp. 1-16 (Communications in Computer and Information Science).

BibTeX

@inproceedings{d8044b515ff9448c9b651b825ea1d8e4,

title = "PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media",

abstract = "We present a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues. The lexicon was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year (1.5 million posts and 20 million comments). Following a topic modeling approach, we extracted 85,898 documents that were used to retrieve domain-specific terms. This term list was then merged with several external sources. Together, they formed a lexicon (16,399 units) marked-up using a crowdsourcing strategy. A sample of Russian native speakers (n = 105) was asked to assess words{\textquoteright} sentiment given the context of their use (randomly paired) as well as the prevailing sentiment of the respective texts. In total, we received 59,208 complete annotations for both texts and words. Several versions of the marked-up lexicon were experimented with, and the final version was tested for quality against the only other freely available Russian language lexicon and against three machine learning algorithms. All experiments were run on two different collections. They have shown that, in terms of, lexicon-based approaches outperform machine learning by 11%, and our lexicon outperforms the alternative one by 11% on the first collection, and by 7% on the negative scale of the second collection while showing similar quality on the positive scale and being three times smaller. Our lexicon also outperforms or is similar to the best existing sentiment analysis results for other types of Russian-language texts.",

keywords = "Lexicon-based approach, Russian language, Sentiment analysis, Social media, Socio-political domain",

author = "Olessia Koltsova and Svetlana Alexeeva and Sergei Pashakhin and Sergei Koltsov",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature Switzerland AG. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.; 9th Conference on Artificial Intelligence and Natural Language, AINL 2020, AINL 2020 ; Conference date: 07-10-2020 Through 09-10-2020",

year = "2020",

doi = "10.1007/978-3-030-59082-6_1",

language = "English",

isbn = "9783030590819",

series = "Communications in Computer and Information Science",

publisher = "Springer Nature",

pages = "1--16",

editor = "Andrey Filchenkov and Janne Kauttonen and Lidia Pivovarova",

booktitle = "Artificial Intelligence and Natural Language - 9th Conference, AINL 2020, Proceedings",

address = "Germany",

}

RIS

TY - GEN

T1 - PolSentiLex

T2 - 9th Conference on Artificial Intelligence and Natural Language, AINL 2020

AU - Koltsova, Olessia

AU - Alexeeva, Svetlana

AU - Pashakhin, Sergei

AU - Koltsov, Sergei

PY - 2020

Y1 - 2020

N2 - We present a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues. The lexicon was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year (1.5 million posts and 20 million comments). Following a topic modeling approach, we extracted 85,898 documents that were used to retrieve domain-specific terms. This term list was then merged with several external sources. Together, they formed a lexicon (16,399 units) marked-up using a crowdsourcing strategy. A sample of Russian native speakers (n = 105) was asked to assess words’ sentiment given the context of their use (randomly paired) as well as the prevailing sentiment of the respective texts. In total, we received 59,208 complete annotations for both texts and words. Several versions of the marked-up lexicon were experimented with, and the final version was tested for quality against the only other freely available Russian language lexicon and against three machine learning algorithms. All experiments were run on two different collections. They have shown that, in terms of, lexicon-based approaches outperform machine learning by 11%, and our lexicon outperforms the alternative one by 11% on the first collection, and by 7% on the negative scale of the second collection while showing similar quality on the positive scale and being three times smaller. Our lexicon also outperforms or is similar to the best existing sentiment analysis results for other types of Russian-language texts.

AB - We present a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues. The lexicon was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year (1.5 million posts and 20 million comments). Following a topic modeling approach, we extracted 85,898 documents that were used to retrieve domain-specific terms. This term list was then merged with several external sources. Together, they formed a lexicon (16,399 units) marked-up using a crowdsourcing strategy. A sample of Russian native speakers (n = 105) was asked to assess words’ sentiment given the context of their use (randomly paired) as well as the prevailing sentiment of the respective texts. In total, we received 59,208 complete annotations for both texts and words. Several versions of the marked-up lexicon were experimented with, and the final version was tested for quality against the only other freely available Russian language lexicon and against three machine learning algorithms. All experiments were run on two different collections. They have shown that, in terms of, lexicon-based approaches outperform machine learning by 11%, and our lexicon outperforms the alternative one by 11% on the first collection, and by 7% on the negative scale of the second collection while showing similar quality on the positive scale and being three times smaller. Our lexicon also outperforms or is similar to the best existing sentiment analysis results for other types of Russian-language texts.

KW - Lexicon-based approach

KW - Russian language

KW - Sentiment analysis

KW - Social media

KW - Socio-political domain

UR - http://www.scopus.com/inward/record.url?scp=85092918682&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/2275a918-55a7-34d4-9edd-b028acaa3eda/

U2 - 10.1007/978-3-030-59082-6_1

DO - 10.1007/978-3-030-59082-6_1

M3 - Conference contribution

AN - SCOPUS:85092918682

SN - 9783030590819

T3 - Communications in Computer and Information Science

SP - 1

EP - 16

BT - Artificial Intelligence and Natural Language - 9th Conference, AINL 2020, Proceedings

A2 - Filchenkov, Andrey

A2 - Kauttonen, Janne

A2 - Pivovarova, Lidia

PB - Springer Nature

Y2 - 7 October 2020 through 9 October 2020

ER -

ID: 71870169