DOI

This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from the headlines using an unsupervised matrix similarity metric. We provide user-friendly online interface for crowdsourced annotation which is available at paraphraser. ru. There are 5181 annotated sentence pairs at the moment, with 4758 of them included in the corpus. The annotation process is going on and the current version of the corpus is freely available at http://paraphraser.ru.

Язык оригиналаАнглийский
Название основной публикацииINFORMATION RETRIEVAL, (RUSSIR 2015)
РедакторыP Braslavski, Markov, P Pardalos, Y Volkovich, DI Ignatov, S Koltsov, O Koltsova
ИздательSpringer Nature
Страницы146-157
Число страниц12
ISBN (печатное издание)978-3-319-41717-2
DOI
СостояниеОпубликовано - 2016
Событие9th Russian Summer School in Information Retrieval (RuSSIR) - St Petersburg
Продолжительность: 24 авг 201528 авг 2015

Серия публикаций

НазваниеCommunications in Computer and Information Science
ИздательSPRINGER INTERNATIONAL PUBLISHING AG
Том573
ISSN (печатное издание)1865-0929

конференция

конференция9th Russian Summer School in Information Retrieval (RuSSIR)
ГородSt Petersburg
Период24/08/1528/08/15

ID: 89669620