Standard

Automatic collocation extraction : Association measures evaluation and integration. / Zakharov, V. P.

в: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, Том 1, № 16, 2017, стр. 387-398.

Результаты исследований: Научные публикации в периодических изданияхстатья в журнале по материалам конференцииРецензирование

Harvard

Zakharov, VP 2017, 'Automatic collocation extraction: Association measures evaluation and integration', Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, Том. 1, № 16, стр. 387-398.

APA

Zakharov, V. P. (2017). Automatic collocation extraction: Association measures evaluation and integration. Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, 1(16), 387-398.

Vancouver

Zakharov VP. Automatic collocation extraction: Association measures evaluation and integration. Komp'juternaja Lingvistika i Intellektual'nye Tehnologii. 2017;1(16):387-398.

Author

Zakharov, V. P. / Automatic collocation extraction : Association measures evaluation and integration. в: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii. 2017 ; Том 1, № 16. стр. 387-398.

BibTeX

@article{c6504ebb25224676b81f74ce0f5cb57b,
title = "Automatic collocation extraction: Association measures evaluation and integration",
abstract = "The paper deals with collocation extraction from corpus data. A collocation is meant as a special type of a set phrase. Many modern authors and most of corpus linguists understand collocations as statistically determined set phrases. The above approach is the basic point of this paper which is aimed at evaluation of various statistical methods of automatic collocation extraction. There are several ways to calculate the degree of coherence of parts of a collocation. A whole number of formulae have been created to integrate different factors that determine the association between the collocation components. Usually, such formulae are called association measures. The experiments are described which objective was to study the method of collocation extraction based on the statistical association measures. We extracted collocations for the word (water) and some others by means of the tool Collocations of the NoSketch Engine system using 7 association measures. It is important to stress that the experiments were conducted using representative corpora, with large amount of the resulting collocations being under study. The data on the measure precision allows to establish to some degree that in cases when collocation extraction is not used for some special purposes such measures as MI.l-og-f, log-Dice, and minimum sensitivity should be used. No measure is ideal, which is why various options of their integration are desirable and useful. And we propose a number of parameters that allow to rank collocates in an integrated list, namely, an average rank, a normalised rank and an optimised rank.",
keywords = "Association measures, Average rank, Collocation extraction, Evaluation, Normalised rank, Optimised medium rank",
author = "Zakharov, {V. P.}",
year = "2017",
language = "English",
volume = "1",
pages = "387--398",
journal = "Компьютерная лингвистика и интеллектуальные технологии",
issn = "2221-7932",
publisher = "Российский государственный гуманитарный университет",
number = "16",
note = "2017 International Conference on Internet and Modern Society, IMS 2017, IMS 2017 ; Conference date: 21-06-2017 Through 23-06-2017",
url = "http://icims.ifmo.ru/, http://ims.ifmo.ru/ru/pages/28/IMS_2017.htm",

}

RIS

TY - JOUR

T1 - Automatic collocation extraction

T2 - 2017 International Conference on Internet and Modern Society, IMS 2017

AU - Zakharov, V. P.

N1 - Conference code: XX

PY - 2017

Y1 - 2017

N2 - The paper deals with collocation extraction from corpus data. A collocation is meant as a special type of a set phrase. Many modern authors and most of corpus linguists understand collocations as statistically determined set phrases. The above approach is the basic point of this paper which is aimed at evaluation of various statistical methods of automatic collocation extraction. There are several ways to calculate the degree of coherence of parts of a collocation. A whole number of formulae have been created to integrate different factors that determine the association between the collocation components. Usually, such formulae are called association measures. The experiments are described which objective was to study the method of collocation extraction based on the statistical association measures. We extracted collocations for the word (water) and some others by means of the tool Collocations of the NoSketch Engine system using 7 association measures. It is important to stress that the experiments were conducted using representative corpora, with large amount of the resulting collocations being under study. The data on the measure precision allows to establish to some degree that in cases when collocation extraction is not used for some special purposes such measures as MI.l-og-f, log-Dice, and minimum sensitivity should be used. No measure is ideal, which is why various options of their integration are desirable and useful. And we propose a number of parameters that allow to rank collocates in an integrated list, namely, an average rank, a normalised rank and an optimised rank.

AB - The paper deals with collocation extraction from corpus data. A collocation is meant as a special type of a set phrase. Many modern authors and most of corpus linguists understand collocations as statistically determined set phrases. The above approach is the basic point of this paper which is aimed at evaluation of various statistical methods of automatic collocation extraction. There are several ways to calculate the degree of coherence of parts of a collocation. A whole number of formulae have been created to integrate different factors that determine the association between the collocation components. Usually, such formulae are called association measures. The experiments are described which objective was to study the method of collocation extraction based on the statistical association measures. We extracted collocations for the word (water) and some others by means of the tool Collocations of the NoSketch Engine system using 7 association measures. It is important to stress that the experiments were conducted using representative corpora, with large amount of the resulting collocations being under study. The data on the measure precision allows to establish to some degree that in cases when collocation extraction is not used for some special purposes such measures as MI.l-og-f, log-Dice, and minimum sensitivity should be used. No measure is ideal, which is why various options of their integration are desirable and useful. And we propose a number of parameters that allow to rank collocates in an integrated list, namely, an average rank, a normalised rank and an optimised rank.

KW - Association measures

KW - Average rank

KW - Collocation extraction

KW - Evaluation

KW - Normalised rank

KW - Optimised medium rank

UR - http://www.scopus.com/inward/record.url?scp=85021840994&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85021840994

VL - 1

SP - 387

EP - 398

JO - Компьютерная лингвистика и интеллектуальные технологии

JF - Компьютерная лингвистика и интеллектуальные технологии

SN - 2221-7932

IS - 16

Y2 - 21 June 2017 through 23 June 2017

ER -

ID: 92112217