Идентификация единиц тезаурусного описания при интеграции лексических ресурсов RussNet и YARN

Research output: Chapter in Book/Report/Conference proceeding › Article in an anthology › Research › peer-review

Department of Mathematical Linguistics

Documents

Azarova I.V et al._last
Accepted author manuscript, 836 KB, PDF document

И.В. Азарова
П.И. Браславский
Ю.А. Киселев
Д.А. Усталов
М.В. Хохлова

Thesauri and ontologies are widely used in many natural language processing tasks and applications. Wordnets are considered to be “a standard NLP tool” along with part-of-speech taggers, syntactic parsers, etc. The paper describes the basic procedure for the integration of two lexicographic resources (RussNet and YARN) that aims at building an online computer lexicon for Russian. The main issue can be seen in vague borders between synsets, the core wordnet ‘building blocks’. Such items include lexical components (lexemes and multiword expressions being semantic equivalents that is traditionally viewed as synonymy. Nevertheless there is still no agreement on dealing with this relation in RussNet and YARN. The authors present the methods for unification of the given synsets. An important aspect of the project is a combination of crowdsourcing-based and expert-based approaches. Crowd management methodology is a new and relevant direction of research in many areas.

Translated title of the contribution	IDENTIFICATION OF THESAURUS UNITS IN THE PROCESS OF INTEGRATION RUSSNET INTO YARN
Original language	Russian
Title of host publication	Структурная и прикладная лингвистика
Subtitle of host publication	Межвузовский сборник. Выпуск 12. К 60-летию отделения прикладной, компьютерной и математической лингвистики СПбГУ
Editors	И.С. Николаев
Place of Publication	СПб.
Publisher	Издательство Санкт-Петербургского университета
Pages	34-52
Number of pages	19
State	Published - 2019

Publication series

Name	СТРУКТУРНАЯ И ПРИКЛАДНАЯ ЛИНГВИСТИКА
Publisher	Издательство Санкт-Петербургского университета
ISSN (Print)	0202-2400

Research areas

WordNet, COMPUTER LEXICOGRAPHY, LEXICAL RESOURCE, thesaurus, INTEGRATION, crowdsourcing

ID: 62369107