Research output: Contribution to journal › Conference article › peer-review
Skell corpora as a part of the language portal Sõnaveeb : Problems and perspectives. / Koppel, Kristina; Kallas, Jelena; Khokhlova, Maria; Suchomel, Vít; Baisa, Vít; Michelfeit, Jan.
In: Proceedings of Electronic Lexicography in the 21st Century Conference, Vol. 2019-October, 01.10.2019, p. 763-782.Research output: Contribution to journal › Conference article › peer-review
}
TY - JOUR
T1 - Skell corpora as a part of the language portal Sõnaveeb
T2 - 6th Biennial Conference on Electronic Lexicography in the 21st Century: Smart Lexicography, eLex 2019
AU - Koppel, Kristina
AU - Kallas, Jelena
AU - Khokhlova, Maria
AU - Suchomel, Vít
AU - Baisa, Vít
AU - Michelfeit, Jan
PY - 2019/10/1
Y1 - 2019/10/1
N2 - The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sõnaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sõnaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.
AB - The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sõnaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sõnaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.
KW - Estonian
KW - GDEX
KW - Learner corpus
KW - Russian
KW - SkELL
UR - http://www.scopus.com/inward/record.url?scp=85075352676&partnerID=8YFLogxK
UR - https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_43.pdf
M3 - Conference article
AN - SCOPUS:85075352676
VL - 2019-October
SP - 763
EP - 782
JO - Electronic Lexicography in the 21st century. Proceedings of eLex 2017 Conference
JF - Electronic Lexicography in the 21st century. Proceedings of eLex 2017 Conference
SN - 2533-5626
Y2 - 1 October 2019 through 3 October 2019
ER -
ID: 49357124