Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Construction of a Russian Paraphrase Corpus : Unsupervised Paraphrase Extraction. / Pronoza, Ekaterina; Yagunova, Elena; Pronoza, Anton.
INFORMATION RETRIEVAL, (RUSSIR 2015). ed. / P Braslavski; Markov; P Pardalos; Y Volkovich; DI Ignatov; S Koltsov; O Koltsova. Springer Nature, 2016. p. 146-157 (Communications in Computer and Information Science; Vol. 573).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Construction of a Russian Paraphrase Corpus
AU - Pronoza, Ekaterina
AU - Yagunova, Elena
AU - Pronoza, Anton
PY - 2016
Y1 - 2016
N2 - This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from the headlines using an unsupervised matrix similarity metric. We provide user-friendly online interface for crowdsourced annotation which is available at paraphraser. ru. There are 5181 annotated sentence pairs at the moment, with 4758 of them included in the corpus. The annotation process is going on and the current version of the corpus is freely available at http://paraphraser.ru.
AB - This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from the headlines using an unsupervised matrix similarity metric. We provide user-friendly online interface for crowdsourced annotation which is available at paraphraser. ru. There are 5181 annotated sentence pairs at the moment, with 4758 of them included in the corpus. The annotation process is going on and the current version of the corpus is freely available at http://paraphraser.ru.
KW - Russian paraphrase corpus
KW - Lexical similarity metric
KW - Unsupervised paraphrase extraction
KW - Crowdsourcing
U2 - 10.1007/978-3-319-41718-9_8
DO - 10.1007/978-3-319-41718-9_8
M3 - статья в сборнике материалов конференции
SN - 978-3-319-41717-2
T3 - Communications in Computer and Information Science
SP - 146
EP - 157
BT - INFORMATION RETRIEVAL, (RUSSIR 2015)
A2 - Braslavski, P
A2 - Markov, null
A2 - Pardalos, P
A2 - Volkovich, Y
A2 - Ignatov, DI
A2 - Koltsov, S
A2 - Koltsova, O
PB - Springer Nature
Y2 - 24 August 2015 through 28 August 2015
ER -
ID: 89669620