Corpus-driven Bambara spelling dictionary › SPbU Researchers Portal

Standard

Corpus-driven Bambara spelling dictionary. / Выдрин, Валентин Феодосьевич; Méric, Jean Jacques.

Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue”. Vol. 19 Moscow : Российский государственный гуманитарный университет, 2020. p. 1180–1187.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Выдрин, ВФ & Méric, JJ 2020, Corpus-driven Bambara spelling dictionary. in Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue”. vol. 19, Российский государственный гуманитарный университет, Moscow, pp. 1180–1187. https://doi.org/10.28995/2075-7182-2020-19-1180-1187

APA

Выдрин, В. Ф., & Méric, J. J. (2020). Corpus-driven Bambara spelling dictionary. In Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue” (Vol. 19, pp. 1180–1187). Российский государственный гуманитарный университет. https://doi.org/10.28995/2075-7182-2020-19-1180-1187

Vancouver

Выдрин ВФ, Méric JJ. Corpus-driven Bambara spelling dictionary. In Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue”. Vol. 19. Moscow: Российский государственный гуманитарный университет. 2020. p. 1180–1187 https://doi.org/10.28995/2075-7182-2020-19-1180-1187

Author

Выдрин, Валентин Феодосьевич ; Méric, Jean Jacques. / Corpus-driven Bambara spelling dictionary. Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue”. Vol. 19 Moscow : Российский государственный гуманитарный университет, 2020. pp. 1180–1187

BibTeX

@inproceedings{9973123d67e642f09e333afb5efe9578,

title = "Corpus-driven Bambara spelling dictionary",

abstract = "A model for the development of a corpus-driven spelling dictionary for the Bambara language is described. First, a list of about 4000 lexemes characterized by spelling variability is extracted from an electronic Bambara-French dictionary. At the next stage, a script is applied to determine the number of occurrences of each spelling variant in the Bambara Reference Corpus, separately for the entire Corpus (more than 11 million words) and for its disambiguated subcorpus (about 1.5 million words). Statistics on the diversity of sources and authors are also obtained automatically. The statistical data are then sorted manually into two lists of lexemes: those whose standard spelling can be established statistically, and those requiring evaluation by expert linguists. Some difficult cases are discussed in the paper. At the final stage, a representative expert commission will discuss all those lexemes for which statistical data alone do not suffice to define a standard spelling variant, before taking a final decision on each. The resulting Bambara spelling dictionary will be published electronically and on paper.",

keywords = "Bambara language, spelling dictionary, spelling norm",

author = "Выдрин, {Валентин Феодосьевич} and M{\'e}ric, {Jean Jacques}",

year = "2020",

doi = "10.28995/2075-7182-2020-19-1180-1187",

language = "English",

volume = "19",

pages = "1180–1187",

booktitle = "Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue”",

publisher = "Российский государственный гуманитарный университет",

address = "Russian Federation",

}

RIS

TY - GEN

T1 - Corpus-driven Bambara spelling dictionary

AU - Выдрин, Валентин Феодосьевич

AU - Méric, Jean Jacques

PY - 2020

Y1 - 2020

N2 - A model for the development of a corpus-driven spelling dictionary for the Bambara language is described. First, a list of about 4000 lexemes characterized by spelling variability is extracted from an electronic Bambara-French dictionary. At the next stage, a script is applied to determine the number of occurrences of each spelling variant in the Bambara Reference Corpus, separately for the entire Corpus (more than 11 million words) and for its disambiguated subcorpus (about 1.5 million words). Statistics on the diversity of sources and authors are also obtained automatically. The statistical data are then sorted manually into two lists of lexemes: those whose standard spelling can be established statistically, and those requiring evaluation by expert linguists. Some difficult cases are discussed in the paper. At the final stage, a representative expert commission will discuss all those lexemes for which statistical data alone do not suffice to define a standard spelling variant, before taking a final decision on each. The resulting Bambara spelling dictionary will be published electronically and on paper.

AB - A model for the development of a corpus-driven spelling dictionary for the Bambara language is described. First, a list of about 4000 lexemes characterized by spelling variability is extracted from an electronic Bambara-French dictionary. At the next stage, a script is applied to determine the number of occurrences of each spelling variant in the Bambara Reference Corpus, separately for the entire Corpus (more than 11 million words) and for its disambiguated subcorpus (about 1.5 million words). Statistics on the diversity of sources and authors are also obtained automatically. The statistical data are then sorted manually into two lists of lexemes: those whose standard spelling can be established statistically, and those requiring evaluation by expert linguists. Some difficult cases are discussed in the paper. At the final stage, a representative expert commission will discuss all those lexemes for which statistical data alone do not suffice to define a standard spelling variant, before taking a final decision on each. The resulting Bambara spelling dictionary will be published electronically and on paper.

KW - Bambara language

KW - spelling dictionary

KW - spelling norm

U2 - 10.28995/2075-7182-2020-19-1180-1187

DO - 10.28995/2075-7182-2020-19-1180-1187

M3 - Conference contribution

VL - 19

SP - 1180

EP - 1187

BT - Computational Linguistics and Intellectual Technologies Papers from the Annual International Conference “Dialogue”

PB - Российский государственный гуманитарный университет

CY - Moscow

ER -

ID: 70666244