Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models. / Pronoza, Ekaterina; Yagunova, Elena.
Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. ред. / Alexander Gelbukh. Том 9623 LNCS Springer Nature, 2018. стр. 573-587 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Том 9623 LNCS).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models
AU - Pronoza, Ekaterina
AU - Yagunova, Elena
N1 - Funding Information: Acknowledgments. We would like to thank Lilia Volkova for her invaluable help. The authors also acknowledge Saint-Petersburg State University for the research grant 30.38.305.2014.
PY - 2018
Y1 - 2018
N2 - Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the metric value satisfies a certain threshold, the corresponding pair of sentences is included in the corpus. These pairs of sentences are further annotated via crowdsourcing. We provide a user-friendly online interface for crowdsourced annotation which is available at http://paraphraser.ru. There are 7480 annotated sentence pairs in the corpus at the moment, and there are still more to come. The types and the features of these sentence pairs are not introduced to the annotators. We adopt a 3-classes classification of paraphrases and distinguish precise paraphrases (conveying the same meaning), loose paraphrases (conveying similar meaning) and non-paraphrases (conveying different meaning).
AB - Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the metric value satisfies a certain threshold, the corresponding pair of sentences is included in the corpus. These pairs of sentences are further annotated via crowdsourcing. We provide a user-friendly online interface for crowdsourced annotation which is available at http://paraphraser.ru. There are 7480 annotated sentence pairs in the corpus at the moment, and there are still more to come. The types and the features of these sentence pairs are not introduced to the annotators. We adopt a 3-classes classification of paraphrases and distinguish precise paraphrases (conveying the same meaning), loose paraphrases (conveying similar meaning) and non-paraphrases (conveying different meaning).
KW - Lexical features
KW - Low-level features
KW - Matrix similarity metric
KW - Paraphrase identification
KW - Semantic features
UR - http://www.scopus.com/inward/record.url?scp=85044415223&partnerID=8YFLogxK
UR - http://www.mendeley.com/research/new-russian-paraphrase-corpus-paraphrase-identification-classification-based-different-prediction-mo
U2 - 10.1007/978-3-319-75477-2_41
DO - 10.1007/978-3-319-75477-2_41
M3 - Conference contribution
SN - 9783319754765
VL - 9623 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 573
EP - 587
BT - Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers
A2 - Gelbukh, Alexander
PB - Springer Nature
T2 - 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016
Y2 - 2 April 2016 through 8 April 2016
ER -
ID: 7633707