Ссылки

—his paper is the continuation of a work submitted to
the International Conference Corpus Linguistics 2021 [1]. On
that occasion, a rule-based stochastic hybrid part-of-speech
tagger (POS) was introduced for Sranan Tongo, a Creole
language from South America with around half a million
speakers. Since Sranan Tongo does not have a written corpus
and text annotation is an expensive and time-consuming task, it
was proposed to take a first step in training a POS tagger using
only 550 hand-annotated sentences with part of speech tags.
In this new contribution, the development of the POS tagger
for Sranan Tongo goes a step further with the addition of more
training data. For this matter, the tagger was used to annotate
2,406 sentences. The tagging results were hand-corrected and
employed to retrain the model. A comparison is shown between
the performance of the POS tagger on three texts before and
after the inclusion of the new training data.
Язык оригиналаанглийский
Страницы (с-по)99-103
ЖурналInternational Journal of Open Information Technologies
Том9
Номер выпуска12
СостояниеОпубликовано - 2021

ID: 89180990