Text collections for evaluation of Russian morphological taggers

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Кафедра информационных систем в искусстве и гуманитарных науках

DOI

https://doi.org/10.1515/jazcas-2017-0035
Конечная издательская версия

Olga Lyashevskaya
Victor Bocharov
Alexey Sorokin
Tatiana Shavrina
Dmitry Granovsky
Svetlana Alexeeva

The paper describes the preparation and development of the text collections within the framework of MorphoRuEval-2017 shared task, an evaluation campaign designed to stimulate development of the automatic morphological processing technologies for Russian. The main challenge for the organizers was to standardize all available Russian corpora with the manually verified high-quality tagging to a single format (Universal Dependencies CONLL-U). The sources of the data were the disambiguated subcorpus of the Russian National Corpus, SynTagRus, OpenCorpora.org data and GICR corpus with the resolved homonymy, all exhibiting different tagsets, rules for lemmatization, pipeline architecture, technical solutions and error systematicity. The collections includes both normative texts (the news and modern literature) and more informal discourse (social media and spoken data), the texts are available under CC BY-NC-SA 3.0 license.

Язык оригинала	английский
Страницы (с-по)	258-267
Число страниц	10
Журнал	Jazykovedny Casopis
Том	68
Номер выпуска	2
DOI	https://doi.org/10.1515/jazcas-2017-0035
Состояние	Опубликовано - дек 2017

Предметные области Scopus

Языки и лингвистика
Языки и лингвистика

ID: 61233855

Pure – это продукт компании Elsevier
На данном информационном ресурсе могут быть опубликованы архивные материалы
с упоминанием физических и юридических лиц, включенных Министерством юстиции
Российской Федерации в реестр иностранных агентов

Вход в Pure