Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review
Keyness Analysis and Its Representation in Russian Academic Papers on Computational Linguistics: Evaluation of Algorithms. / Khokhlova, Maria ; Koryshev, Mikhail .
Proceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022. Tribun EU, 2022. p. 25–33 ( Recent Advances in Slavonic Natural Language Processing).Research output: Chapter in Book/Report/Conference proceeding › Chapter › peer-review
}
TY - CHAP
T1 - Keyness Analysis and Its Representation in Russian Academic Papers on Computational Linguistics: Evaluation of Algorithms
AU - Khokhlova, Maria
AU - Koryshev, Mikhail
PY - 2022
Y1 - 2022
N2 - Extraction of relevant lexis has gained significance as theamount of information is continuously growing with news, posts on social networks, reviews, academic papers, etc. piling up. Automated algorithms are needed to analyze texts to facilitate understanding of their content. The paper scrutinizes methods for keyword extraction in abstractsof Russian scientific texts on computational linguistics. Unsupervised algorithms based on statistics, graphs and machine learning principles areconsidered. The results are evaluated against the keywords assigned byauthors themselves, followed by expert opinion. Log-likelihood producedthe best results in comparison with author keywords, while KeyBERT implementation with vectorizers outperformed other algorithms accordingto expert assessment.
AB - Extraction of relevant lexis has gained significance as theamount of information is continuously growing with news, posts on social networks, reviews, academic papers, etc. piling up. Automated algorithms are needed to analyze texts to facilitate understanding of their content. The paper scrutinizes methods for keyword extraction in abstractsof Russian scientific texts on computational linguistics. Unsupervised algorithms based on statistics, graphs and machine learning principles areconsidered. The results are evaluated against the keywords assigned byauthors themselves, followed by expert opinion. Log-likelihood producedthe best results in comparison with author keywords, while KeyBERT implementation with vectorizers outperformed other algorithms accordingto expert assessment.
KW - Keyword extraction
KW - Academic papers
KW - abstracts
KW - computational linguistics
KW - Log-likelihood
KW - TextRank
KW - RAKE
KW - YAKE
KW - KeyBERT
UR - https://dblp.org/db/conf/raslan/raslan2022.html
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85171465447&origin=inward&txGid=a59a60d431102e0964fac2763cd058cd
M3 - Chapter
SN - 9788026317524
T3 - Recent Advances in Slavonic Natural Language Processing
SP - 25
EP - 33
BT - Proceedings of the Sixteenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2022
PB - Tribun EU
T2 - 16th Workshop on Recent Advances in Slavonic Natural Languages Processing
Y2 - 9 December 2022 through 11 December 2022
ER -
ID: 105206630