Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
This paper presents a crowdsourcing project on the creation of a publicly available corpus of sentential paraphrases for Russian. Collected from the news headlines, such corpus could be applied for information extraction and text summarization. We collect news headlines from different agencies in real-time; paraphrase candidates are extracted from the headlines using an unsupervised matrix similarity metric. We provide user-friendly online interface for crowdsourced annotation which is available at paraphraser. ru. There are 5181 annotated sentence pairs at the moment, with 4758 of them included in the corpus. The annotation process is going on and the current version of the corpus is freely available at http://paraphraser.ru.
Original language | English |
---|---|
Title of host publication | INFORMATION RETRIEVAL, (RUSSIR 2015) |
Editors | P Braslavski, Markov, P Pardalos, Y Volkovich, DI Ignatov, S Koltsov, O Koltsova |
Publisher | Springer Nature |
Pages | 146-157 |
Number of pages | 12 |
ISBN (Print) | 978-3-319-41717-2 |
DOIs | |
State | Published - 2016 |
Event | 9th Russian Summer School in Information Retrieval (RuSSIR) - St Petersburg Duration: 24 Aug 2015 → 28 Aug 2015 |
Name | Communications in Computer and Information Science |
---|---|
Publisher | SPRINGER INTERNATIONAL PUBLISHING AG |
Volume | 573 |
ISSN (Print) | 1865-0929 |
Conference | 9th Russian Summer School in Information Retrieval (RuSSIR) |
---|---|
City | St Petersburg |
Period | 24/08/15 → 28/08/15 |
ID: 89669620