Skell corpora as a part of the language portal Sõnaveeb: Problems and perspectives

Kristina Koppel, Jelena Kallas, Maria Khokhlova, Vít Suchomel, Vít Baisa, Jan Michelfeit

Research output

1 Citation (Scopus)

Abstract

The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sõnaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sõnaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.

Original languageEnglish
Pages (from-to)763-782
JournalProceedings of Electronic Lexicography in the 21st Century Conference
Volume2019-October
Publication statusPublished - 1 Oct 2019
Event6th Biennial Conference on Electronic Lexicography in the 21st Century: Smart Lexicography, eLex 2019 - Sintra
Duration: 1 Oct 20193 Oct 2019

Fingerprint

language
dictionary
learning
Language
evaluation
Dictionary
Language Acquisition

Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Koppel, Kristina ; Kallas, Jelena ; Khokhlova, Maria ; Suchomel, Vít ; Baisa, Vít ; Michelfeit, Jan. / Skell corpora as a part of the language portal Sõnaveeb : Problems and perspectives. In: Proceedings of Electronic Lexicography in the 21st Century Conference. 2019 ; Vol. 2019-October. pp. 763-782.
@article{5cb3881f99a94a398f01ab3cde82dd40,
title = "Skell corpora as a part of the language portal S{\~o}naveeb: Problems and perspectives",
abstract = "The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of S{\~o}naveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently S{\~o}naveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.",
keywords = "Estonian, GDEX, Learner corpus, Russian, SkELL",
author = "Kristina Koppel and Jelena Kallas and Maria Khokhlova and V{\'i}t Suchomel and V{\'i}t Baisa and Jan Michelfeit",
year = "2019",
month = "10",
day = "1",
language = "English",
volume = "2019-October",
pages = "763--782",
journal = "Electronic Lexicography in the 21st century. Proceedings of eLex 2017 Conference",
issn = "2533-5626",

}

Skell corpora as a part of the language portal Sõnaveeb : Problems and perspectives. / Koppel, Kristina; Kallas, Jelena; Khokhlova, Maria; Suchomel, Vít; Baisa, Vít; Michelfeit, Jan.

In: Proceedings of Electronic Lexicography in the 21st Century Conference, Vol. 2019-October, 01.10.2019, p. 763-782.

Research output

TY - JOUR

T1 - Skell corpora as a part of the language portal Sõnaveeb

T2 - Problems and perspectives

AU - Koppel, Kristina

AU - Kallas, Jelena

AU - Khokhlova, Maria

AU - Suchomel, Vít

AU - Baisa, Vít

AU - Michelfeit, Jan

PY - 2019/10/1

Y1 - 2019/10/1

N2 - The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sõnaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sõnaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.

AB - The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sõnaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sõnaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.

KW - Estonian

KW - GDEX

KW - Learner corpus

KW - Russian

KW - SkELL

UR - http://www.scopus.com/inward/record.url?scp=85075352676&partnerID=8YFLogxK

UR - https://elex.link/elex2019/wp-content/uploads/2019/09/eLex_2019_43.pdf

M3 - Conference article

AN - SCOPUS:85075352676

VL - 2019-October

SP - 763

EP - 782

JO - Electronic Lexicography in the 21st century. Proceedings of eLex 2017 Conference

JF - Electronic Lexicography in the 21st century. Proceedings of eLex 2017 Conference

SN - 2533-5626

ER -