The paper presents the issue of collocability and collocations in Russian and gives a survey of a wide range of dictionaries both printed and online ones that describe collocations. Our project deals with building a database that will include dictionary and statistical collocations. The former can be described in various lexicographic resources whereas the latter can be extracted automatically from corpora. Dictionaries differ among themselves, the information is given in various ways, making it hard for language learners and researchers to acquire data. A number of dictionaries were analyzed and processed to retrieve verified collocations, however the overlap between the lists of collocations extracted from them is still rather small. This fact indicates there is a need to create a unified resource which takes into account collocability and more examples. The proposed resource will also be useful for linguists and for studying Russian as a foreign language. The obtained results can be important for machine learning and for other NLP tasks, for instance, automatic clustering of word combinations and disambiguation.

Язык оригиналаанглийский
Название основной публикацииLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
РедакторыNicoletta Calzolari, Frederic Bechet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Место публикацииParis
ИздательEuropean Language Resources Association (ELRA)
Страницы3198-3206
Число страниц9
ISBN (электронное издание)9791095546344
ISBN (печатное издание)9791095546344
СостояниеОпубликовано - 2020
Событие12th International Conference on Language Resources and Evaluation - Marseille, Франция
Продолжительность: 11 мая 202016 мая 2020

Серия публикаций

НазваниеLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings

конференция

конференция12th International Conference on Language Resources and Evaluation
Сокращенное названиеLREC 2020
Страна/TерриторияФранция
ГородMarseille
Период11/05/2016/05/20

    Предметные области Scopus

  • Образование
  • Библиотечные и информационные науки
  • Языки и лингвистика
  • Языки и лингвистика

ID: 61200560