Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Summarization Algorithms for News : A Study of the Coronavirus Theme and Its Impact on the News Extracting Algorithm. / Gadasina, Lyudmila; Veklenko, Vladislav; Luukka, Pasi.
Computational Data and Social Networks - 10th International Conference, CSoNet 2021, Proceedings. ed. / David Mohaisen; Ruoming Jin. Springer Nature, 2021. p. 351-360 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13116 LNCS).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Summarization Algorithms for News
T2 - 10th International Conference on Computational Data and Social Networks, CSoNet 2021
AU - Gadasina, Lyudmila
AU - Veklenko, Vladislav
AU - Luukka, Pasi
N1 - Publisher Copyright: © 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Extract summarization algorithms help identify significant information from the news by extracting meaningful sentences from the original text. The information background existing at the time of the news release often significantly affects its content. Such background can distort the text summarization algorithm working results. The study was conducted with the example of the theme “coronavirus” (COVID-19), which at the time of the study was one of the main topics in news feeds. Experiments were carried out on sports news articles, concerned football. This news area was selected because it is not related to medical topics. The TextRank algorithm for sport news extraction was applied in two ways. First, the key information from the source text of news was extracted. Then, a list of the COVID related words was created and the key information from news without considering words from this list was extracted. Our approach showed that mentioning a popular theme such as COVID that is not related to sports can have a negative impact on the text summarization algorithm. We suggest that to obtain accurate results of the algorithm operation, it is necessary to first compile a dictionary of terms related to the coronavirus theme and then exclude them when identifying the main content of news texts.
AB - Extract summarization algorithms help identify significant information from the news by extracting meaningful sentences from the original text. The information background existing at the time of the news release often significantly affects its content. Such background can distort the text summarization algorithm working results. The study was conducted with the example of the theme “coronavirus” (COVID-19), which at the time of the study was one of the main topics in news feeds. Experiments were carried out on sports news articles, concerned football. This news area was selected because it is not related to medical topics. The TextRank algorithm for sport news extraction was applied in two ways. First, the key information from the source text of news was extracted. Then, a list of the COVID related words was created and the key information from news without considering words from this list was extracted. Our approach showed that mentioning a popular theme such as COVID that is not related to sports can have a negative impact on the text summarization algorithm. We suggest that to obtain accurate results of the algorithm operation, it is necessary to first compile a dictionary of terms related to the coronavirus theme and then exclude them when identifying the main content of news texts.
KW - Coronavirus
KW - Extracting
KW - News
KW - Summarization algorithm
KW - Text
UR - http://www.scopus.com/inward/record.url?scp=85121872878&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-91434-9_30
DO - 10.1007/978-3-030-91434-9_30
M3 - Conference contribution
AN - SCOPUS:85121872878
SN - 9783030914332
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 351
EP - 360
BT - Computational Data and Social Networks - 10th International Conference, CSoNet 2021, Proceedings
A2 - Mohaisen, David
A2 - Jin, Ruoming
PB - Springer Nature
Y2 - 15 November 2021 through 17 November 2021
ER -
ID: 91076367