Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
A new corpus of the russian social network news feed paraphrases : Corpus construction and linguistic feature analysis. / Pronoza, Ekaterina; Yagunova, Elena; Pronoza, Anton.
Advances in Computational Intelligence - 16th Mexican International Conference on Artificial Intelligence, MICAI 2017, Proceedings. ed. / Miguel González-Mendoza; Félix Castro; Sabino Miranda-Jiménez. Springer Nature, 2018. p. 133-145 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10633 LNAI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - A new corpus of the russian social network news feed paraphrases
T2 - 16th Mexican International Conference on Artificial Intelligence, MICAI 2017
AU - Pronoza, Ekaterina
AU - Yagunova, Elena
AU - Pronoza, Anton
N1 - Publisher Copyright: © Springer Nature Switzerland AG 2018. Copyright: Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2018
Y1 - 2018
N2 - In this paper we present a new Russian paraphrase corpus derived from the news feed of the social network and conduct its primary analysis. Most media agencies post their news reports on their pages in social networks, and the headlines of the messages are often the same as those of the corresponding news articles from the official websites of the agencies. However, sometimes these pairs of headlines differ, and in such cases a headline from the social network can be considered a compression or a paraphrase of the original headline. In other words, such news feed from social networks is a rich resource of textual entailment, and, as it is shown in this paper, various linguistic phenomena, e.g., irony, presupposition and attention attracting markers. We collect the described pairs of headlines and construct the Russian social network news feed paraphrase corpus based on them. We test the paraphrase detection model trained on the other existing Russian paraphrase corpus, ParaPhraser.ru, collected from official news headlines only, against the constructed dataset, and explore its linguistic and pragmatic features.
AB - In this paper we present a new Russian paraphrase corpus derived from the news feed of the social network and conduct its primary analysis. Most media agencies post their news reports on their pages in social networks, and the headlines of the messages are often the same as those of the corresponding news articles from the official websites of the agencies. However, sometimes these pairs of headlines differ, and in such cases a headline from the social network can be considered a compression or a paraphrase of the original headline. In other words, such news feed from social networks is a rich resource of textual entailment, and, as it is shown in this paper, various linguistic phenomena, e.g., irony, presupposition and attention attracting markers. We collect the described pairs of headlines and construct the Russian social network news feed paraphrase corpus based on them. We test the paraphrase detection model trained on the other existing Russian paraphrase corpus, ParaPhraser.ru, collected from official news headlines only, against the constructed dataset, and explore its linguistic and pragmatic features.
KW - Linguistic phenomena
KW - Loose paraphrase
KW - News headlines
KW - Paraphrase corpus
KW - Social network news feed
KW - Text compression
KW - Textual entailment
UR - http://www.scopus.com/inward/record.url?scp=85059937316&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-02840-4_11
DO - 10.1007/978-3-030-02840-4_11
M3 - Conference contribution
AN - SCOPUS:85059937316
SN - 9783030028398
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 133
EP - 145
BT - Advances in Computational Intelligence - 16th Mexican International Conference on Artificial Intelligence, MICAI 2017, Proceedings
A2 - González-Mendoza, Miguel
A2 - Castro, Félix
A2 - Miranda-Jiménez, Sabino
PB - Springer Nature
Y2 - 23 October 2017 through 28 October 2017
ER -
ID: 73343642