Sentence paraphrase graphs : Classification based on predictive models or annotators’ decisions? / Pronoza, Ekaterina; Yagunova, Elena; Kochetkova, Nataliya.
Advances in Soft Computing - 15th Mexican International Conference on Artificial Intelligence, MICAI 2016, Proceedings. Vol. 10061 LNAI Springer Nature, 2017. p. 41-52 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10061 LNAI).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Sentence paraphrase graphs
T2 - 15th Mexican International Conference on Artificial Intelligence, MICAI 2016
AU - Pronoza, Ekaterina
AU - Yagunova, Elena
AU - Kochetkova, Nataliya
PY - 2017
Y1 - 2017
N2 - As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase detection model can be more successful at predicting paraphrase class than a naive native speaker. Our paraphrase corpus is collected from news headlines and therefore can be considered a summarized news stream describing the most important events. By building a graph of paraphrases, we can detect such events. In this paper we construct two such graphs: based on the current human annotation and on the complex model prediction. The structure of the graphs is compared and analyzed and it is shown that the model graph has larger connected components which give a more complete picture of the important events than the human annotation graph. Predictive model appears to be better at capturing full information about the important events from the news collection than human annotators.
AB - As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase detection model can be more successful at predicting paraphrase class than a naive native speaker. Our paraphrase corpus is collected from news headlines and therefore can be considered a summarized news stream describing the most important events. By building a graph of paraphrases, we can detect such events. In this paper we construct two such graphs: based on the current human annotation and on the complex model prediction. The structure of the graphs is compared and analyzed and it is shown that the model graph has larger connected components which give a more complete picture of the important events than the human annotation graph. Predictive model appears to be better at capturing full information about the important events from the news collection than human annotators.
KW - Central nodes
KW - Connected components
KW - News stream
KW - Paraphrase graph
KW - Predictive model
KW - Sentential paraphrase
UR - http://www.scopus.com/inward/record.url?scp=85028456531&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-62434-1_4
DO - 10.1007/978-3-319-62434-1_4
M3 - Conference contribution
SN - 9783319624334
VL - 10061 LNAI
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 41
EP - 52
BT - Advances in Soft Computing - 15th Mexican International Conference on Artificial Intelligence, MICAI 2016, Proceedings
PB - Springer Nature
Y2 - 22 October 2016 through 27 October 2016
ER -
ID: 7633637