As part of our project ParaPhraser on the identification and classification of Russian paraphrase, we have collected a corpus of more than 8000 sentence pairs annotated as precise, loose or non-paraphrases. The corpus is annotated via crowdsourcing by naïve native Russian speakers, but from the point of view of the expert, our complex paraphrase detection model can be more successful at predicting paraphrase class than a naive native speaker. Our paraphrase corpus is collected from news headlines and therefore can be considered a summarized news stream describing the most important events. By building a graph of paraphrases, we can detect such events. In this paper we construct two such graphs: based on the current human annotation and on the complex model prediction. The structure of the graphs is compared and analyzed and it is shown that the model graph has larger connected components which give a more complete picture of the important events than the human annotation graph. Predictive model appears to be better at capturing full information about the important events from the news collection than human annotators.

Original languageEnglish
Title of host publicationAdvances in Soft Computing - 15th Mexican International Conference on Artificial Intelligence, MICAI 2016, Proceedings
PublisherSpringer Nature
Pages41-52
Number of pages12
Volume10061 LNAI
ISBN (Print)9783319624334
DOIs
StatePublished - 2017
Externally publishedYes
Event15th Mexican International Conference on Artificial Intelligence, MICAI 2016 - Cancun, Mexico
Duration: 22 Oct 201627 Oct 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10061 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th Mexican International Conference on Artificial Intelligence, MICAI 2016
Country/TerritoryMexico
CityCancun
Period22/10/1627/10/16

    Research areas

  • Central nodes, Connected components, News stream, Paraphrase graph, Predictive model, Sentential paraphrase

    Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

ID: 7633637