As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase detection model can be more successful at predicting paraphrase class than a naive native speaker. Our paraphrase corpus is collected from news headlines and therefore can be considered a summarized news stream describing the most important events. By building a graph of paraphrases, we can detect such events. In this paper we construct two such graphs: based on the current human annotation and on the complex model prediction. The structure of the graphs is compared and analyzed and it is shown that the model graph has larger connected components which give a more complete picture of the important events than the human annotation graph. Predictive model appears to be better at capturing full information about the important events from the news collection than human annotators.
Original language | English |
---|---|
Title of host publication | Advances in Soft Computing - 15th Mexican International Conference on Artificial Intelligence, MICAI 2016, Proceedings |
Publisher | Springer Nature |
Pages | 41-52 |
Number of pages | 12 |
Volume | 10061 LNAI |
ISBN (Print) | 9783319624334 |
DOIs | |
State | Published - 2017 |
Externally published | Yes |
Event | 15th Mexican International Conference on Artificial Intelligence, MICAI 2016 - Cancun, Mexico Duration: 22 Oct 2016 → 27 Oct 2016 |
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 10061 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference | 15th Mexican International Conference on Artificial Intelligence, MICAI 2016 |
---|---|
Country/Territory | Mexico |
City | Cancun |
Period | 22/10/16 → 27/10/16 |
ID: 7633637