Most studies that deal with keyword extraction focus on English texts and do not pay much attention to the role of significant lexemes and their intersection with topics. This chapter presents the results of automatic keyword extraction from a German journalistic articles (about 500 thousand tokens) using the following three statistical methods: log-likelihood, RAKE and YAKE algorithms. The authors identified the most frequently used keywords that can shed light on the topics that attract journalists’ utmost attention. The technique allows tracing transformations in topic selection over time and analysing similarities between articles. The scope of topics that were traced based on the selected keywords includes matches with the topics identified by experts. The results reveal the heterogeneous nature of texts published in different years (not only in their structure but also in content), suggesting shifts in the thematic focus of articles change over time.
Original languageEnglish
Title of host publicationDigital Geography: Proceedings of the International Conference on Internet and Modern Society (IMS 2023)
PublisherSpringer Nature
Pages67-74
Number of pages8
DOIs
StatePublished - 2024
EventInternational Conference “Internet and Modern Society” (IMS-2023) - ИТМО, Санкт-Петербург, Russian Federation
Duration: 26 Jun 202328 Jun 2023
Conference number: 26
https://ims.itmo.ru/
https://ims.itmo.ru

Publication series

NameSpringer Geography
VolumePart F3643

Conference

ConferenceInternational Conference “Internet and Modern Society” (IMS-2023)
Abbreviated titleIMS-2023
Country/TerritoryRussian Federation
CityСанкт-Петербург
Period26/06/2328/06/23
Internet address

    Research areas

  • German language, Keyword extraction, Log-likelihood, RAKE, YAKE

ID: 115095339