The paper deals with collocation extraction from corpus data. A whole number of formulae have been created to integrate different factors that determine the association between the collocation components. The experiments are described which objective was to study the method of collocation extraction based on the statistical association measures. The work is focused on bigram collocations. The obtained data on the measure precision allow to establish to some degree that some measures are more precise than others. No measure is ideal, which is why various options of their integration are desirable and useful. We propose a number of parameters that allow to rank collocates in an combined list, namely, an average rank, a normalized rank and an optimized rank.
Язык оригиналаанглийский
Название основной публикацииText, Speech, and Dialogue
Подзаголовок основной публикации20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31, 2017, Proceedings
Место публикацииCham
ИздательSpringer Nature
Страницы255-262
Число страниц8
ISBN (электронное издание)978-3-319-64206-2
ISBN (печатное издание)978-3-319-64205-5
СостояниеОпубликовано - 2017
СобытиеText, Speech, and Dialogue: 20th International Conference - Prague, Чехия
Продолжительность: 27 авг 201731 авг 2017

Серия публикаций

НазваниеLecture Notes in Computer Science
ИздательSpringer Nature
Том10415
ISSN (печатное издание)0302-9743

конференция

конференцияText, Speech, and Dialogue
Сокращенное названиеTSD 2017
Страна/TерриторияЧехия
ГородPrague
Период27/08/1731/08/17

    Области исследований

  • Collocation extraction , Association measures , Evaluation ‧ Ranking , Average rank ‧ , Normalized rank ‧, Optimized rank

    Предметные области Scopus

  • Социальные науки (все)

ID: 71300483