Towards a part-of-speech tagger for Sranan Tongo

Standard

Towards a part-of-speech tagger for Sranan Tongo. / Cortegoso Vissio, Nicolás ; Zakharov, Viktor .

In: International Journal of Open Information Technologies, Vol. 9, No. 12, 2021, p. 99-103.

Research output: Contribution to journal › Article › peer-review

Harvard

Cortegoso Vissio, N & Zakharov, V 2021, 'Towards a part-of-speech tagger for Sranan Tongo', International Journal of Open Information Technologies, vol. 9, no. 12, pp. 99-103. <http://injoit.org/index.php/j1/article/view/1235/1171>

APA

Cortegoso Vissio, N., & Zakharov, V. (2021). Towards a part-of-speech tagger for Sranan Tongo. International Journal of Open Information Technologies, 9(12), 99-103. http://injoit.org/index.php/j1/article/view/1235/1171

Vancouver

Cortegoso Vissio N , Zakharov V. Towards a part-of-speech tagger for Sranan Tongo. International Journal of Open Information Technologies. 2021;9(12):99-103.

Author

Cortegoso Vissio, Nicolás ; Zakharov, Viktor . / Towards a part-of-speech tagger for Sranan Tongo. In: International Journal of Open Information Technologies. 2021 ; Vol. 9, No. 12. pp. 99-103.

BibTeX

@article{161b5246dd16431c9669c90d5e11c258,

title = "Towards a part-of-speech tagger for Sranan Tongo",

abstract = "his paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021 [1]. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags. In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. For this matter, the tagger was used to annotate 2,406 sentences. The tagging results were hand-corrected and employed to retrain the model. A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data.",

author = "{Cortegoso Vissio}, Nicol{\'a}s and Viktor Zakharov",

note = "Cortegoso Vissio N., Zakharov V. Towards a part-of-speech tagger for Sranan Tongo // International Journal of Open Information Technologies. Vol 9, No 12 (2021). P. 99-103.",

year = "2021",

language = "English",

volume = "9",

pages = "99--103",

journal = "International Journal of Open Information Technologies",

issn = "2307-8162",

publisher = "Издательство Московского университета",

number = "12",

}

RIS

TY - JOUR

T1 - Towards a part-of-speech tagger for Sranan Tongo

AU - Cortegoso Vissio, Nicolás

AU - Zakharov, Viktor

N1 - Cortegoso Vissio N., Zakharov V. Towards a part-of-speech tagger for Sranan Tongo // International Journal of Open Information Technologies. Vol 9, No 12 (2021). P. 99-103.

PY - 2021

Y1 - 2021

N2 - his paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021 [1]. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags. In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. For this matter, the tagger was used to annotate 2,406 sentences. The tagging results were hand-corrected and employed to retrain the model. A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data.

AB - his paper is the continuation of a work submitted to the International Conference Corpus Linguistics 2021 [1]. On that occasion, a rule-based stochastic hybrid part-of-speech tagger (POS) was introduced for Sranan Tongo, a Creole language from South America with around half a million speakers. Since Sranan Tongo does not have a written corpus and text annotation is an expensive and time-consuming task, it was proposed to take a first step in training a POS tagger using only 550 hand-annotated sentences with part of speech tags. In this new contribution, the development of the POS tagger for Sranan Tongo goes a step further with the addition of more training data. For this matter, the tagger was used to annotate 2,406 sentences. The tagging results were hand-corrected and employed to retrain the model. A comparison is shown between the performance of the POS tagger on three texts before and after the inclusion of the new training data.

M3 - Article

VL - 9

SP - 99

EP - 103

JO - International Journal of Open Information Technologies

JF - International Journal of Open Information Technologies

SN - 2307-8162

IS - 12

ER -

ID: 89180990