Extraction of relevant lexis has gained significance as the
amount of information is continuously growing with news, posts on social networks, reviews, academic papers, etc. piling up. Automated algorithms are needed to analyze texts to facilitate understanding of their content. The paper scrutinizes methods for keyword extraction in abstracts
of Russian scientific texts on computational linguistics. Unsupervised algorithms based on statistics, graphs and machine learning principles are
considered. The results are evaluated against the keywords assigned by
authors themselves, followed by expert opinion. Log-likelihood produced
the best results in comparison with author keywords, while KeyBERT implementation with vectorizers outperformed other algorithms according
to expert assessment.
Язык оригиналаанглийский
Название основной публикацииProceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022
ИздательTribun EU
Страницы25–33
ISBN (печатное издание)9788026317524
СостояниеОпубликовано - 2022
Событие16th Workshop on Recent Advances in Slavonic Natural Languages Processing - Karlova Studánka, Чехия
Продолжительность: 9 дек 202211 дек 2022
https://raslan2022.nlp-consulting.net/index.html

Серия публикаций

Название Recent Advances in Slavonic Natural Language Processing
ИздательNLP Consulting
ISSN (печатное издание)2336-4289

конференция

конференция16th Workshop on Recent Advances in Slavonic Natural Languages Processing
Сокращенное названиеRASLAN 2022
Страна/TерриторияЧехия
ГородKarlova Studánka
Период9/12/2211/12/22
Сайт в сети Internet

ID: 105206630