Research output: Contribution to journal › Article › peer-review
Towards a part-of-speech tagger for Sranan Tongo. / Cortegoso Vissio, Nicolás ; Zakharov, Viktor .
In: International Journal of Open Information Technologies, Vol. 9, No. 12, 2021, p. 99-103.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Towards a part-of-speech tagger for Sranan Tongo
AU - Cortegoso Vissio, Nicolás
AU - Zakharov, Viktor
N1 - Cortegoso Vissio N., Zakharov V. Towards a part-of-speech tagger for Sranan Tongo // International Journal of Open Information Technologies. Vol 9, No 12 (2021). P. 99-103.
PY - 2021
Y1 - 2021
N2 - his paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021 [1]. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags. In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. For this matter, the tagger was used to annotate 2,406 sentences. The tagging results were hand-corrected and employed to retrain the model. A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data.
AB - his paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021 [1]. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags. In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. For this matter, the tagger was used to annotate 2,406 sentences. The tagging results were hand-corrected and employed to retrain the model. A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data.
M3 - Article
VL - 9
SP - 99
EP - 103
JO - International Journal of Open Information Technologies
JF - International Journal of Open Information Technologies
SN - 2307-8162
IS - 12
ER -
ID: 89180990