Towards a part-of-speech tagger for Sranan Tongo

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Ссылки

http://injoit.org/index.php/j1/article/view/1235/1171
Конечная издательская версия

—his paper is the continuation of a work submitted to
the International Conference Corpus Linguistics 2021 [1]. On
that occasion, a rule-based stochastic hybrid part-of-speech
tagger (POS) was introduced for Sranan Tongo, a Creole
language from South America with around half a million
speakers. Since Sranan Tongo does not have a written corpus
and text annotation is an expensive and time-consuming task, it
was proposed to take a first step in training a POS tagger using
only 550 hand-annotated sentences with part of speech tags.
In this new contribution, the development of the POS tagger
for Sranan Tongo goes a step further with the addition of more
training data. For this matter, the tagger was used to annotate
2,406 sentences. The tagging results were hand-corrected and
employed to retrain the model. A comparison is shown between
the performance of the POS tagger on three texts before and
after the inclusion of the new training data.

Язык оригинала	английский
Страницы (с-по)	99-103
Журнал	International Journal of Open Information Technologies
Том	9
Номер выпуска	12
Состояние	Опубликовано - 2021

ID: 89180990

Pure – это продукт компании Elsevier
На данном информационном ресурсе могут быть опубликованы архивные материалы
с упоминанием физических и юридических лиц, включенных Министерством юстиции
Российской Федерации в реестр иностранных агентов

Вход в Pure