Research output: Contribution to journal › Article › peer-review
Text collections for evaluation of Russian morphological taggers. / Lyashevskaya, Olga; Bocharov, Victor; Sorokin, Alexey; Shavrina, Tatiana; Granovsky, Dmitry; Alexeeva, Svetlana.
In: Jazykovedny Casopis, Vol. 68, No. 2, 12.2017, p. 258-267.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Text collections for evaluation of Russian morphological taggers
AU - Lyashevskaya, Olga
AU - Bocharov, Victor
AU - Sorokin, Alexey
AU - Shavrina, Tatiana
AU - Granovsky, Dmitry
AU - Alexeeva, Svetlana
PY - 2017/12
Y1 - 2017/12
N2 - The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.
AB - The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.
KW - Morphological parsing
KW - Morphological tagging
KW - Russian corpora
KW - Shared task
KW - Text collection
KW - Universal dependencies
UR - http://www.scopus.com/inward/record.url?scp=85048125524&partnerID=8YFLogxK
U2 - 10.1515/jazcas-2017-0035
DO - 10.1515/jazcas-2017-0035
M3 - Article
AN - SCOPUS:85048125524
VL - 68
SP - 258
EP - 267
JO - Jazykovedny Casopis
JF - Jazykovedny Casopis
SN - 0021-5597
IS - 2
ER -
ID: 61233855