DOI

The paper deals with collocation extraction from corpus data. The experiments are described with the objective to study collocation extraction based on statistical association measures. A whole number of formulas have been created to integrate different factors that determine the association between the collocation components. The experiments are described whose objective was to study the method of collocation extraction based on the statistical association measures. The paper is focused on bigram collocations. The obtained data on the measure precision allows to establish to some degree that in cases when collocation extraction is not used for some special purposes such measures as MI.l-og_f, log-Dice, minimum sensitivity should be used. At the same time, various options of their integration are desirable and useful. To use advantages of separate measures, we offer to create a combined list of collocations extracted by different measures and propose a number of parameters that allow to rank collocates in a combined list in some reasonable way.
Original languageEnglish
Title of host publicationProceedings of the International Conference IMS-2017 (St. Petersburg; Russian Federation, 21-24 June 2017)
Pages125-134
Number of pages10
DOIs
StatePublished - 2017
Event2017 International Conference on Internet and Modern Society, IMS 2017: международная объединенная конференция - Университет ИТМО, Санкт-Петербург, Russian Federation
Duration: 21 Jun 201723 Jun 2017
Conference number: XX
http://icims.ifmo.ru/
http://ims.ifmo.ru/ru/pages/28/IMS_2017.htm

Publication series

NameACM INTERNATIONAL CONFERENCE PROCEEDINGS SERIES

Conference

Conference2017 International Conference on Internet and Modern Society, IMS 2017
Abbreviated titleIMS 2017
Country/TerritoryRussian Federation
CityСанкт-Петербург
Period21/06/1723/06/17
Internet address

ID: 34962509