Morphological tagging of Russian texts of the XIXth century › Научные исследования в СПбГУ

Victor Zakharov
Sergei Volkov

Tagging Russian texts of the XIX^th century has been evaluated. The causes have been determined why some words turned out to be unknown to the tagger, i.e. remained without lemmas and grammatical features. The investigation showed that the main reasons of the existence of the unknown words were as follows: 1) incompleteness of the tagger dictionary, particularly in the XIX^th century lexical stock; 2) failure to tag the word-formative derivates; 3) problems with some inflexion models of Old Russian; 4) insufficiency of graphemic analysis; 5) inability of taggers to process multiwords. The results obtained provide a baseline to improve premorphological processing of Russian texts and to work out the more sophisticated approaches to morphological analysis.

Язык оригинала	английский
Страницы (с-по)	235-242
Число страниц	8
Журнал	Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Том	3206
Состояние	Опубликовано - 1 дек 2004
Событие	7th International Conference TSD 2004: Text, Speech and Dialogue - Brno, Чехия Продолжительность: 8 сен 2004 → 11 сен 2004

Предметные области Scopus

Теоретические компьютерные науки
Компьютерные науки (все)

ID: 30268644