Building a gold standard for a russian collocations database

Standard

Building a gold standard for a russian collocations database. / Khokhlova, Maria.

18th Euralex International Congress, 2018. ed. / Vojko Gorjanc; Simon Krek; Jaka Cibej; Iztok Kosem. Ljubljana : European Association for Lexicography, 2018. p. 863-869.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Harvard

Khokhlova, M 2018, Building a gold standard for a russian collocations database. in V Gorjanc, S Krek, J Cibej & I Kosem (eds), 18th Euralex International Congress, 2018. European Association for Lexicography, Ljubljana, pp. 863-869, 18th Euralex International Congress, 2018, Ljubljana, Slovenia, 17/07/18. https://doi.org/10.4312/9789610600961

APA

Khokhlova, M. (2018). Building a gold standard for a russian collocations database. In V. Gorjanc, S. Krek, J. Cibej, & I. Kosem (Eds.), 18th Euralex International Congress, 2018 (pp. 863-869). European Association for Lexicography. https://doi.org/10.4312/9789610600961

Vancouver

Khokhlova M. Building a gold standard for a russian collocations database. In Gorjanc V, Krek S, Cibej J, Kosem I, editors, 18th Euralex International Congress, 2018. Ljubljana: European Association for Lexicography. 2018. p. 863-869 https://doi.org/10.4312/9789610600961

Author

Khokhlova, Maria. / Building a gold standard for a russian collocations database. 18th Euralex International Congress, 2018. editor / Vojko Gorjanc ; Simon Krek ; Jaka Cibej ; Iztok Kosem. Ljubljana : European Association for Lexicography, 2018. pp. 863-869

BibTeX

@inproceedings{fde614095e4249788959e58a977e3c86,

title = "Building a gold standard for a russian collocations database",

abstract = "In the last decade, linguists have become increasingly interested in corpus material, which allows for a fresh approach to the phenomena that have already been extensively described in academic works. The dual nature of the co-occurrence phenomenon itself lies, on one hand, in its linguistic component and, on the other, in the probabilistic (combinatorial) characteristics. The former has been described in numerous papers and explicitly defined in dictionaries, while the latter can be identified by a statistical approach. The present paper focuses on the process of building a gold standard that will include data from Russian dictionaries and corpora. The standard is being prepared for a Russian Collocations Database that already includes information on words' collocability and was extracted from text corpora by statistical measures and linguistic filters. The gold standard will be also used for the evaluation of the extracted collocations and for marking them as “true“ collocations with references to the dictionaries.",

keywords = "Collocations, Corpora, Database, Dictionaries, Russian language",

author = "Maria Khokhlova",

year = "2018",

month = jan,

day = "1",

doi = "10.4312/9789610600961",

language = "English",

isbn = "9789610600978",

pages = "863--869",

editor = "Vojko Gorjanc and Simon Krek and Jaka Cibej and Iztok Kosem",

booktitle = "18th Euralex International Congress, 2018",

publisher = "European Association for Lexicography",

address = "United Kingdom",

note = "18th Euralex International Congress, 2018 ; Conference date: 17-07-2018 Through 21-07-2018",

}

RIS

TY - GEN

T1 - Building a gold standard for a russian collocations database

AU - Khokhlova, Maria

PY - 2018/1/1

Y1 - 2018/1/1

N2 - In the last decade, linguists have become increasingly interested in corpus material, which allows for a fresh approach to the phenomena that have already been extensively described in academic works. The dual nature of the co-occurrence phenomenon itself lies, on one hand, in its linguistic component and, on the other, in the probabilistic (combinatorial) characteristics. The former has been described in numerous papers and explicitly defined in dictionaries, while the latter can be identified by a statistical approach. The present paper focuses on the process of building a gold standard that will include data from Russian dictionaries and corpora. The standard is being prepared for a Russian Collocations Database that already includes information on words' collocability and was extracted from text corpora by statistical measures and linguistic filters. The gold standard will be also used for the evaluation of the extracted collocations and for marking them as “true“ collocations with references to the dictionaries.

AB - In the last decade, linguists have become increasingly interested in corpus material, which allows for a fresh approach to the phenomena that have already been extensively described in academic works. The dual nature of the co-occurrence phenomenon itself lies, on one hand, in its linguistic component and, on the other, in the probabilistic (combinatorial) characteristics. The former has been described in numerous papers and explicitly defined in dictionaries, while the latter can be identified by a statistical approach. The present paper focuses on the process of building a gold standard that will include data from Russian dictionaries and corpora. The standard is being prepared for a Russian Collocations Database that already includes information on words' collocability and was extracted from text corpora by statistical measures and linguistic filters. The gold standard will be also used for the evaluation of the extracted collocations and for marking them as “true“ collocations with references to the dictionaries.

KW - Collocations

KW - Corpora

KW - Database

KW - Dictionaries

KW - Russian language

UR - http://www.scopus.com/inward/record.url?scp=85059369182&partnerID=8YFLogxK

U2 - 10.4312/9789610600961

DO - 10.4312/9789610600961

M3 - Conference contribution

AN - SCOPUS:85059369182

SN - 9789610600978

SP - 863

EP - 869

BT - 18th Euralex International Congress, 2018

A2 - Gorjanc, Vojko

A2 - Krek, Simon

A2 - Cibej, Jaka

A2 - Kosem, Iztok

PB - European Association for Lexicography

CY - Ljubljana

T2 - 18th Euralex International Congress, 2018

Y2 - 17 July 2018 through 21 July 2018

ER -

ID: 32847037