The paper deals with collocation extraction from corpus data. A whole number of formulae have been created to integrate different factors that determine the association between the collocation components. The experiments are described which objective was to study the method of collocation extraction based on the statistical association measures. The work is focused on bigram collocations. The obtained data on the measure precision allow to establish to some degree that some measures are more precise than others. No measure is ideal, which is why various options of their integration are desirable and useful. We propose a number of parameters that allow to rank collocates in an combined list, namely, an average rank, a normalized rank and an optimized rank.
Original languageEnglish
Title of host publicationText, Speech, and Dialogue
Subtitle of host publication20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31, 2017, Proceedings
Place of PublicationCham
PublisherSpringer Nature
Pages255-262
Number of pages8
ISBN (Electronic)978-3-319-64206-2
ISBN (Print)978-3-319-64205-5
StatePublished - 2017
EventText, Speech, and Dialogue: 20th International Conference - Prague, Czech Republic
Duration: 27 Aug 201731 Aug 2017

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Nature
Volume10415
ISSN (Print)0302-9743

Conference

ConferenceText, Speech, and Dialogue
Abbreviated titleTSD 2017
Country/TerritoryCzech Republic
CityPrague
Period27/08/1731/08/17

    Research areas

  • Collocation extraction, Association measures, Evaluation, Ranking, Average rank, Normalized rank, Optimized rank

    Scopus subject areas

  • Social Sciences(all)

ID: 71300483