In this paper we present a new Russian paraphrase corpus derived from the news feed of the social network and conduct its primary analysis. Most media agencies post their news reports on their pages in social networks, and the headlines of the messages are often the same as those of the corresponding news articles from the official websites of the agencies. However, sometimes these pairs of headlines differ, and in such cases a headline from the social network can be considered a compression or a paraphrase of the original headline. In other words, such news feed from social networks is a rich resource of textual entailment, and, as it is shown in this paper, various linguistic phenomena, e.g., irony, presupposition and attention attracting markers. We collect the described pairs of headlines and construct the Russian social network news feed paraphrase corpus based on them. We test the paraphrase detection model trained on the other existing Russian paraphrase corpus, ParaPhraser.ru, collected from official news headlines only, against the constructed dataset, and explore its linguistic and pragmatic features.

Original languageEnglish
Title of host publicationAdvances in Computational Intelligence - 16th Mexican International Conference on Artificial Intelligence, MICAI 2017, Proceedings
EditorsMiguel González-Mendoza, Félix Castro, Sabino Miranda-Jiménez
PublisherSpringer Nature
Pages133-145
Number of pages13
ISBN (Print)9783030028398
DOIs
StatePublished - 2018
Event16th Mexican International Conference on Artificial Intelligence, MICAI 2017 - Enseneda, Mexico
Duration: 23 Oct 201728 Oct 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10633 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th Mexican International Conference on Artificial Intelligence, MICAI 2017
Country/TerritoryMexico
CityEnseneda
Period23/10/1728/10/17

    Research areas

  • Linguistic phenomena, Loose paraphrase, News headlines, Paraphrase corpus, Social network news feed, Text compression, Textual entailment

    Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

ID: 73343642