Building a gold standard for a russian collocations database

DOI

https://doi.org/10.4312/9789610600961
Final published version

Maria Khokhlova

In the last decade, linguists have become increasingly interested in corpus material, which allows for a fresh approach to the phenomena that have already been extensively described in academic works. The dual nature of the co-occurrence phenomenon itself lies, on one hand, in its linguistic component and, on the other, in the probabilistic (combinatorial) characteristics. The former has been described in numerous papers and explicitly defined in dictionaries, while the latter can be identified by a statistical approach. The present paper focuses on the process of building a gold standard that will include data from Russian dictionaries and corpora. The standard is being prepared for a Russian Collocations Database that already includes information on words' collocability and was extracted from text corpora by statistical measures and linguistic filters. The gold standard will be also used for the evaluation of the extracted collocations and for marking them as “true“ collocations with references to the dictionaries.

Original language	English
Title of host publication	18th Euralex International Congress, 2018
Editors	Vojko Gorjanc, Simon Krek, Jaka Cibej, Iztok Kosem
Place of Publication	Ljubljana
Publisher	European Association for Lexicography
Pages	863-869
Number of pages	7
ISBN (Electronic)	9789610600961
ISBN (Print)	9789610600978
DOIs	https://doi.org/10.4312/9789610600961
State	Published - 1 Jan 2018
Event	18th Euralex International Congress, 2018 - Ljubljana, Slovenia Duration: 17 Jul 2018 → 21 Jul 2018

Conference

Conference	18th Euralex International Congress, 2018
Country/Territory	Slovenia
City	Ljubljana
Period	17/07/18 → 21/07/18

Research areas

Collocations, Corpora, Database, Dictionaries, Russian language

Scopus subject areas

Language and Linguistics
Linguistics and Language

ID: 32847037