Corpus-driven Bambara spelling dictionary › SPbU Researchers Portal

DOI

https://doi.org/10.28995/2075-7182-2020-19-1180-1187
Final published version

Валентин Феодосьевич Выдрин
Jean Jacques Méric

A model for the development of a corpus-driven spelling dictionary for the Bambara language is described. First, a list of about 4000 lexemes characterized by spelling variability is extracted from an electronic Bambara-French dictionary. At the next stage, a script is applied to determine the number of occurrences of each spelling variant in the Bambara Reference Corpus, separately for the entire Corpus (more than 11 million words) and for its disambiguated subcorpus (about 1.5 million words). Statistics on the diversity of sources and authors are also obtained automatically. The statistical data are then sorted manually into two lists of lexemes: those whose standard spelling can be established statistically, and those requiring evaluation by expert linguists. Some difficult cases are discussed in the paper. At the final stage, a representative expert commission will discuss all those lexemes for which statistical data alone do not suffice to define a standard spelling variant, before taking a final decision on each. The resulting Bambara spelling dictionary will be published electronically and on paper.

Translated title of the contribution	КОРПУСНОЙ ОРФОГРАФИЧЕСКИЙ СЛОВАРЬ ЯЗЫКА БАМАНА
Original language	English
Title of host publication	Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue”
Place of Publication	Moscow
Publisher	Российский государственный гуманитарный университет
Pages	1180–1187
Number of pages	8
Volume	19
ISBN (Electronic)	978-5-7281-2948-6
DOIs	https://doi.org/10.28995/2075-7182-2020-19-1180-1187
State	Published - 2020

Scopus subject areas

Arts and Humanities(all)

Research areas

Bambara language, spelling dictionary, spelling norm

ID: 70666244