Research output: Chapter in Book/Report/Conference proceeding › Article in an anthology › Research › peer-review
Automatic keyword extraction from German journalistic discourse using statistical methods. / Хохлова, Мария Владимировна; Корышев, Михаил Витальевич.
Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2023). Springer Nature, 2024. p. 67-74 (Springer Geography; Vol. Part F3643).Research output: Chapter in Book/Report/Conference proceeding › Article in an anthology › Research › peer-review
}
TY - CHAP
T1 - Automatic keyword extraction from German journalistic discourse using statistical methods
AU - Хохлова, Мария Владимировна
AU - Корышев, Михаил Витальевич
N1 - Conference code: 26
PY - 2024
Y1 - 2024
N2 - Most studies that deal with keyword extraction focus on English texts and do not pay much attention to the role of significant lexemes and their intersection with topics. This chapter presents the results of automatic keyword extraction from a German journalistic articles (about 500 thousand tokens) using the following three statistical methods: log-likelihood, RAKE and YAKE algorithms. The authors identified the most frequently used keywords that can shed light on the topics that attract journalists’ utmost attention. The technique allows tracing transformations in topic selection over time and analysing similarities between articles. The scope of topics that were traced based on the selected keywords includes matches with the topics identified by experts. The results reveal the heterogeneous nature of texts published in different years (not only in their structure but also in content), suggesting shifts in the thematic focus of articles change over time.
AB - Most studies that deal with keyword extraction focus on English texts and do not pay much attention to the role of significant lexemes and their intersection with topics. This chapter presents the results of automatic keyword extraction from a German journalistic articles (about 500 thousand tokens) using the following three statistical methods: log-likelihood, RAKE and YAKE algorithms. The authors identified the most frequently used keywords that can shed light on the topics that attract journalists’ utmost attention. The technique allows tracing transformations in topic selection over time and analysing similarities between articles. The scope of topics that were traced based on the selected keywords includes matches with the topics identified by experts. The results reveal the heterogeneous nature of texts published in different years (not only in their structure but also in content), suggesting shifts in the thematic focus of articles change over time.
KW - German language
KW - Keyword extraction
KW - Log-likelihood
KW - RAKE
KW - YAKE
UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85210552439&partnerID=MN8TOARS
UR - https://www.mendeley.com/catalogue/f07b7331-ac4b-3603-b2e7-9a8f5862041c/
U2 - 10.1007/978-3-031-67762-5_6
DO - 10.1007/978-3-031-67762-5_6
M3 - Article in an anthology
T3 - Springer Geography
SP - 67
EP - 74
BT - Digital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2023)
PB - Springer Nature
T2 - International Conference “Internet and Modern Society” (IMS-2023)
Y2 - 26 June 2023 through 28 June 2023
ER -
ID: 115095339