Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the metric value satisfies a certain threshold, the corresponding pair of sentences is included in the corpus. These pairs of sentences are further annotated via crowdsourcing. We provide a user-friendly online interface for crowdsourced annotation which is available at http://paraphraser.ru. There are 7480 annotated sentence pairs in the corpus at the moment, and there are still more to come. The types and the features of these sentence pairs are not introduced to the annotators. We adopt a 3-classes classification of paraphrases and distinguish precise paraphrases (conveying the same meaning), loose paraphrases (conveying similar meaning) and non-paraphrases (conveying different meaning).
Original language | English |
---|---|
Title of host publication | Computational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers |
Editors | Alexander Gelbukh |
Publisher | Springer Nature |
Pages | 573-587 |
Number of pages | 15 |
Volume | 9623 LNCS |
ISBN (Print) | 9783319754765 |
DOIs | |
State | Published - 2018 |
Event | 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 - Konya, Turkey Duration: 2 Apr 2016 → 8 Apr 2016 |
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 9623 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference | 17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 |
---|---|
Country/Territory | Turkey |
City | Konya |
Period | 2/04/16 → 8/04/16 |
ID: 7633707