Extraction of relevant lexis has gained significance as the
amount of information is continuously growing with news, posts on social networks, reviews, academic papers, etc. piling up. Automated algorithms are needed to analyze texts to facilitate understanding of their content. The paper scrutinizes methods for keyword extraction in abstracts
of Russian scientific texts on computational linguistics. Unsupervised algorithms based on statistics, graphs and machine learning principles are
considered. The results are evaluated against the keywords assigned by
authors themselves, followed by expert opinion. Log-likelihood produced
the best results in comparison with author keywords, while KeyBERT implementation with vectorizers outperformed other algorithms according
to expert assessment.
Original languageEnglish
Title of host publicationProceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022
PublisherTribun EU
Pages25–33
ISBN (Print)9788026317524
StatePublished - 2022
Event16th Workshop on Recent Advances in Slavonic Natural Languages Processing - Karlova Studánka, Czech Republic
Duration: 9 Dec 202211 Dec 2022
https://raslan2022.nlp-consulting.net/index.html

Publication series

Name Recent Advances in Slavonic Natural Language Processing
PublisherNLP Consulting
ISSN (Print)2336-4289

Conference

Conference16th Workshop on Recent Advances in Slavonic Natural Languages Processing
Abbreviated titleRASLAN 2022
Country/TerritoryCzech Republic
CityKarlova Studánka
Period9/12/2211/12/22
Internet address

    Research areas

  • Keyword extraction, Academic papers, abstracts, computational linguistics, Log-likelihood, TextRank, RAKE, YAKE, KeyBERT

ID: 105206630