Keyness Analysis and Its Representation in Russian Academic Papers on Computational Linguistics: Evaluation of Algorithms

Links

https://nlp.fi.muni.cz/raslan/2022/paper9.pdf

Extraction of relevant lexis has gained significance as the
amount of information is continuously growing with news, posts on social networks, reviews, academic papers, etc. piling up. Automated algorithms are needed to analyze texts to facilitate understanding of their content. The paper scrutinizes methods for keyword extraction in abstracts
of Russian scientific texts on computational linguistics. Unsupervised algorithms based on statistics, graphs and machine learning principles are
considered. The results are evaluated against the keywords assigned by
authors themselves, followed by expert opinion. Log-likelihood produced
the best results in comparison with author keywords, while KeyBERT implementation with vectorizers outperformed other algorithms according
to expert assessment.

Original language	English
Title of host publication	Proceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022
Publisher	Tribun EU
Pages	25–33
ISBN (Print)	9788026317524
State	Published - 2022
Event	16th Workshop on Recent Advances in Slavonic Natural Languages Processing - Karlova Studánka, Czech Republic Duration: 9 Dec 2022 → 11 Dec 2022 https://raslan2022.nlp-consulting.net/index.html

Publication series

Name	Recent Advances in Slavonic Natural Language Processing
Publisher	NLP Consulting
ISSN (Print)	2336-4289

Conference

Conference	16th Workshop on Recent Advances in Slavonic Natural Languages Processing
Abbreviated title	RASLAN 2022
Country/Territory	Czech Republic
City	Karlova Studánka
Period	9/12/22 → 11/12/22
Internet address	https://raslan2022.nlp-consulting.net/index.html

Research areas

Keyword extraction, Academic papers, abstracts, computational linguistics, Log-likelihood, TextRank, RAKE, YAKE, KeyBERT

ID: 105206630