A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models

Standard

A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models. / Pronoza, Ekaterina ; Yagunova, Elena.

Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. ed. / Alexander Gelbukh. Vol. 9623 LNCS Springer Nature, 2018. p. 573-587 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9623 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Harvard

Pronoza, E & Yagunova, E 2018, A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models. in A Gelbukh (ed.), Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. vol. 9623 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9623 LNCS, Springer Nature, pp. 573-587, 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016, Konya, Turkey, 2/04/16. https://doi.org/10.1007/978-3-319-75477-2_41

APA

Pronoza, E., & Yagunova, E. (2018). A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models. In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers (Vol. 9623 LNCS, pp. 573-587). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9623 LNCS). Springer Nature. https://doi.org/10.1007/978-3-319-75477-2_41

Vancouver

Pronoza E , Yagunova E. A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models. In Gelbukh A, editor, Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. Vol. 9623 LNCS. Springer Nature. 2018. p. 573-587. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-75477-2_41

Author

Pronoza, Ekaterina ; Yagunova, Elena. / A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models. Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers. editor / Alexander Gelbukh. Vol. 9623 LNCS Springer Nature, 2018. pp. 573-587 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

BibTeX

@inproceedings{7f05ec1802bc4ec78acaa56ff420e5ce,

title = "A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models",

abstract = "Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the metric value satisfies a certain threshold, the corresponding pair of sentences is included in the corpus. These pairs of sentences are further annotated via crowdsourcing. We provide a user-friendly online interface for crowdsourced annotation which is available at http://paraphraser.ru. There are 7480 annotated sentence pairs in the corpus at the moment, and there are still more to come. The types and the features of these sentence pairs are not introduced to the annotators. We adopt a 3-classes classification of paraphrases and distinguish precise paraphrases (conveying the same meaning), loose paraphrases (conveying similar meaning) and non-paraphrases (conveying different meaning).",

keywords = "Lexical features, Low-level features, Matrix similarity metric, Paraphrase identification, Semantic features",

author = "Ekaterina Pronoza and Elena Yagunova",

note = "Funding Information: Acknowledgments. We would like to thank Lilia Volkova for her invaluable help. The authors also acknowledge Saint-Petersburg State University for the research grant 30.38.305.2014.; 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 ; Conference date: 02-04-2016 Through 08-04-2016",

year = "2018",

doi = "10.1007/978-3-319-75477-2_41",

language = "English",

isbn = "9783319754765",

volume = "9623 LNCS",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Nature",

pages = "573--587",

editor = "Alexander Gelbukh",

booktitle = "Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers",

address = "Germany",

}

RIS

TY - GEN

T1 - A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models

AU - Pronoza, Ekaterina

AU - Yagunova, Elena

N1 - Funding Information: Acknowledgments. We would like to thank Lilia Volkova for her invaluable help. The authors also acknowledge Saint-Petersburg State University for the research grant 30.38.305.2014.

PY - 2018

Y1 - 2018

N2 - Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the metric value satisfies a certain threshold, the corresponding pair of sentences is included in the corpus. These pairs of sentences are further annotated via crowdsourcing. We provide a user-friendly online interface for crowdsourced annotation which is available at http://paraphraser.ru. There are 7480 annotated sentence pairs in the corpus at the moment, and there are still more to come. The types and the features of these sentence pairs are not introduced to the annotators. We adopt a 3-classes classification of paraphrases and distinguish precise paraphrases (conveying the same meaning), loose paraphrases (conveying similar meaning) and non-paraphrases (conveying different meaning).

AB - Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the metric value satisfies a certain threshold, the corresponding pair of sentences is included in the corpus. These pairs of sentences are further annotated via crowdsourcing. We provide a user-friendly online interface for crowdsourced annotation which is available at http://paraphraser.ru. There are 7480 annotated sentence pairs in the corpus at the moment, and there are still more to come. The types and the features of these sentence pairs are not introduced to the annotators. We adopt a 3-classes classification of paraphrases and distinguish precise paraphrases (conveying the same meaning), loose paraphrases (conveying similar meaning) and non-paraphrases (conveying different meaning).

KW - Lexical features

KW - Low-level features

KW - Matrix similarity metric

KW - Paraphrase identification

KW - Semantic features

UR - http://www.scopus.com/inward/record.url?scp=85044415223&partnerID=8YFLogxK

UR - http://www.mendeley.com/research/new-russian-paraphrase-corpus-paraphrase-identification-classification-based-different-prediction-mo

U2 - 10.1007/978-3-319-75477-2_41

DO - 10.1007/978-3-319-75477-2_41

M3 - Conference contribution

SN - 9783319754765

VL - 9623 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 573

EP - 587

BT - Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers

A2 - Gelbukh, Alexander

PB - Springer Nature

T2 - 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016

Y2 - 2 April 2016 through 8 April 2016

ER -

ID: 7633707