A new corpus of the russian social network news feed paraphrases: Corpus construction and linguistic feature analysis

Ekaterina Pronoza, Elena Yagunova, Anton Pronoza

Результат исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаярецензирование

1 Цитирования (Scopus)

Аннотация

In this paper we present a new Russian paraphrase corpus derived from the news feed of the social network and conduct its primary analysis. Most media agencies post their news reports on their pages in social networks, and the headlines of the messages are often the same as those of the corresponding news articles from the official websites of the agencies. However, sometimes these pairs of headlines differ, and in such cases a headline from the social network can be considered a compression or a paraphrase of the original headline. In other words, such news feed from social networks is a rich resource of textual entailment, and, as it is shown in this paper, various linguistic phenomena, e.g., irony, presupposition and attention attracting markers. We collect the described pairs of headlines and construct the Russian social network news feed paraphrase corpus based on them. We test the paraphrase detection model trained on the other existing Russian paraphrase corpus, ParaPhraser.ru, collected from official news headlines only, against the constructed dataset, and explore its linguistic and pragmatic features.

Язык оригиналаанглийский
Название основной публикацииAdvances in Computational Intelligence - 16th Mexican International Conference on Artificial Intelligence, MICAI 2017, Proceedings
РедакторыMiguel González-Mendoza, Félix Castro, Sabino Miranda-Jiménez
ИздательSpringer Nature
Страницы133-145
Число страниц13
ISBN (печатное издание)9783030028398
DOI
СостояниеОпубликовано - 2018
Событие16th Mexican International Conference on Artificial Intelligence, MICAI 2017 - Enseneda, Мексика
Продолжительность: 23 окт 201728 окт 2017

Серия публикаций

НазваниеLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том10633 LNAI
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

конференция

конференция16th Mexican International Conference on Artificial Intelligence, MICAI 2017
СтранаМексика
ГородEnseneda
Период23/10/1728/10/17

Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)

Fingerprint Подробные сведения о темах исследования «A new corpus of the russian social network news feed paraphrases: Corpus construction and linguistic feature analysis». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать