Text collections for evaluation of Russian morphological taggers

Standard

Text collections for evaluation of Russian morphological taggers. / Lyashevskaya, Olga; Bocharov, Victor ; Sorokin, Alexey; Shavrina, Tatiana; Granovsky, Dmitry; Alexeeva, Svetlana.

In: Jazykovedny Casopis, Vol. 68, No. 2, 12.2017, p. 258-267.

Research output: Contribution to journal › Article › peer-review

Author

Lyashevskaya, Olga ; Bocharov, Victor ; Sorokin, Alexey ; Shavrina, Tatiana ; Granovsky, Dmitry ; Alexeeva, Svetlana. / Text collections for evaluation of Russian morphological taggers. In: Jazykovedny Casopis. 2017 ; Vol. 68, No. 2. pp. 258-267.

BibTeX

@article{d50cadd0a46a4109ae885863e59f8cea,

title = "Text collections for evaluation of Russian morphological taggers",

abstract = "The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.",

keywords = "Morphological parsing, Morphological tagging, Russian corpora, Shared task, Text collection, Universal dependencies",

author = "Olga Lyashevskaya and Victor Bocharov and Alexey Sorokin and Tatiana Shavrina and Dmitry Granovsky and Svetlana Alexeeva",

year = "2017",

month = dec,

doi = "10.1515/jazcas-2017-0035",

language = "English",

volume = "68",

pages = "258--267",

journal = "Jazykovedny Casopis",

issn = "0021-5597",

publisher = "De Gruyter",

number = "2",

}

RIS

TY - JOUR

T1 - Text collections for evaluation of Russian morphological taggers

AU - Lyashevskaya, Olga

AU - Bocharov, Victor

AU - Sorokin, Alexey

AU - Shavrina, Tatiana

AU - Granovsky, Dmitry

AU - Alexeeva, Svetlana

PY - 2017/12

Y1 - 2017/12

N2 - The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.

AB - The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.

KW - Morphological parsing

KW - Morphological tagging

KW - Russian corpora

KW - Shared task

KW - Text collection

KW - Universal dependencies

UR - http://www.scopus.com/inward/record.url?scp=85048125524&partnerID=8YFLogxK

U2 - 10.1515/jazcas-2017-0035

DO - 10.1515/jazcas-2017-0035

M3 - Article

AN - SCOPUS:85048125524

VL - 68

SP - 258

EP - 267

JO - Jazykovedny Casopis

JF - Jazykovedny Casopis

SN - 0021-5597

IS - 2

ER -

ID: 61233855