Documents

Thesauri and ontologies are widely used in many natural language processing tasks and applications. Wordnets are considered to be “a standard NLP tool” along with part-of-speech taggers, syntactic parsers, etc. The paper describes the basic procedure for the integration of two lexicographic resources (RussNet and YARN) that aims at building an online computer lexicon for Russian. The main issue can be seen in vague borders between synsets, the core wordnet ‘building blocks’. Such items include lexical components (lexemes and multiword expressions being semantic equivalents that is traditionally viewed as synonymy. Nevertheless there is still no agreement on dealing with this relation in RussNet and YARN. The authors present the methods for unification of the given synsets. An important aspect of the project is a combination of crowdsourcing-based and expert-based approaches. Crowd management methodology is a new and relevant direction of research in many areas.
Translated title of the contributionIDENTIFICATION OF THESAURUS UNITS IN THE PROCESS OF INTEGRATION RUSSNET INTO YARN
Original languageRussian
Title of host publicationСтруктурная и прикладная лингвистика
Subtitle of host publicationМежвузовский сборник. Выпуск 12. К 60-летию отделения прикладной, компьютерной и математической лингвистики СПбГУ
EditorsИ.С. Николаев
Place of PublicationСПб.
PublisherИздательство Санкт-Петербургского университета
Pages34-52
Number of pages19
StatePublished - 2019

Publication series

NameСТРУКТУРНАЯ И ПРИКЛАДНАЯ ЛИНГВИСТИКА
PublisherИздательство Санкт-Петербургского университета
ISSN (Print)0202-2400

    Research areas

  • WordNet, COMPUTER LEXICOGRAPHY, LEXICAL RESOURCE, thesaurus, INTEGRATION, crowdsourcing

ID: 62369107