Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Similarity between the association measures : A case study of noun phrases. / Khokhlova, Maria.
Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Languages Processing. ed. / Pavel Rychly; Adam Rambousek; Ales Horak. Vol. 2018-December Tribun EU, 2018. p. 21-27 ( Recent Advances in Slavonic Natural Language Processing).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Similarity between the association measures
T2 - 12th Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2018
AU - Khokhlova, Maria
PY - 2018/12/1
Y1 - 2018/12/1
N2 - Collocation extraction has gained much attention in natural language processing, its results are important in various areas of applied linguistics. The research focuses on a comparison between over a dozen of association measures based on a subset of the Russian Web corpus. The paper studies the automatically extracted Adj-Noun collocations. The aim of the experiments is two-fold. First, to examine the difference between statistical measures and second to find the most effective one for the Russian data. The former assumes the calculation of the Spearman’s rank correlation coefficient and the latter implies the evaluation of the extracted lists against a Russian dictionary, i.e. identifying automatically extracted and manually collected collocations. The results are not such straightforward, one can distinguish between groups of measures that demonstrate a relative interchangeability. Also the produced bigrams can be considered as collocations by experts and thus may enrich dictionaries.
AB - Collocation extraction has gained much attention in natural language processing, its results are important in various areas of applied linguistics. The research focuses on a comparison between over a dozen of association measures based on a subset of the Russian Web corpus. The paper studies the automatically extracted Adj-Noun collocations. The aim of the experiments is two-fold. First, to examine the difference between statistical measures and second to find the most effective one for the Russian data. The former assumes the calculation of the Spearman’s rank correlation coefficient and the latter implies the evaluation of the extracted lists against a Russian dictionary, i.e. identifying automatically extracted and manually collected collocations. The results are not such straightforward, one can distinguish between groups of measures that demonstrate a relative interchangeability. Also the produced bigrams can be considered as collocations by experts and thus may enrich dictionaries.
KW - Collocability
KW - Collocations
KW - Corpora
KW - Gold standard
KW - Statistical measures
KW - Statistics
UR - http://www.scopus.com/inward/record.url?scp=85062198775&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85062198775
VL - 2018-December
T3 - Recent Advances in Slavonic Natural Language Processing
SP - 21
EP - 27
BT - Proceedings of the Tenth Workshop on Recent Advances in Slavonic Natural Languages Processing
A2 - Rychly, Pavel
A2 - Rambousek, Adam
A2 - Horak, Ales
PB - Tribun EU
Y2 - 7 December 2018 through 9 December 2018
ER -
ID: 36878080