Links

Collocation extraction has gained much attention in natural language processing, its results are important in various areas of applied linguistics. The research focuses on a comparison between over a dozen of association measures based on a subset of the Russian Web corpus. The paper studies the automatically extracted Adj-Noun collocations. The aim of the experiments is two-fold. First, to examine the difference between statistical measures and second to find the most effective one for the Russian data. The former assumes the calculation of the Spearman’s rank correlation coefficient and the latter implies the evaluation of the extracted lists against a Russian dictionary, i.e. identifying automatically extracted and manually collected collocations. The results are not such straightforward, one can distinguish between groups of measures that demonstrate a relative interchangeability. Also the produced bigrams can be considered as collocations by experts and thus may enrich dictionaries.

Original languageEnglish
Title of host publicationProceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Languages Processing
EditorsPavel Rychly, Adam Rambousek, Ales Horak
PublisherTribun EU
Pages21-27
Number of pages7
Volume2018-December
ISBN (Electronic)9788026315179
StatePublished - 1 Dec 2018
Event12th Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2018 - Karlova Studanka, Czech Republic
Duration: 7 Dec 20189 Dec 2018

Publication series

Name Recent Advances in Slavonic Natural Language Processing
ISSN (Print)2336-4289

Conference

Conference12th Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2018
Country/TerritoryCzech Republic
CityKarlova Studanka
Period7/12/189/12/18

    Research areas

  • Collocability, Collocations, Corpora, Gold standard, Statistical measures, Statistics

    Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Information Systems
  • Software

ID: 36878080