Ссылки

DOI

Our main objectives are constructing a paraphrase corpus for Russian and developing of the paraphrase identification and classification models based on this corpus. The corpus consists of pairs of news headlines from different media agencies which are extracted and analyzed in real time. Paraphrase candidates are extracted using an unsupervised matrix similarity metric: if the metric value satisfies a certain threshold, the corresponding pair of sentences is included in the corpus. These pairs of sentences are further annotated via crowdsourcing. We provide a user-friendly online interface for crowdsourced annotation which is available at http://paraphraser.ru. There are 7480 annotated sentence pairs in the corpus at the moment, and there are still more to come. The types and the features of these sentence pairs are not introduced to the annotators. We adopt a 3-classes classification of paraphrases and distinguish precise paraphrases (conveying the same meaning), loose paraphrases (conveying similar meaning) and non-paraphrases (conveying different meaning).

Язык оригиналаанглийский
Название основной публикацииComputational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers
РедакторыAlexander Gelbukh
ИздательSpringer Nature
Страницы573-587
Число страниц15
Том9623 LNCS
ISBN (печатное издание)9783319754765
DOI
СостояниеОпубликовано - 2018
Событие17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 - Konya, Турция
Продолжительность: 2 апр 20168 апр 2016

Серия публикаций

НазваниеLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Том9623 LNCS
ISSN (печатное издание)0302-9743
ISSN (электронное издание)1611-3349

конференция

конференция17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016
Страна/TерриторияТурция
ГородKonya
Период2/04/168/04/16

    Предметные области Scopus

  • Теоретические компьютерные науки
  • Компьютерные науки (все)

ID: 7633707