Abstract

© 2016 FRUCT.This work deseribes the experience of ereating a corefarence resolution system for Russian language. Coreference resolution is a key subtask of Information Extraction, and aims to grouping mentions that refer to the same discourse entity. This work was aimed to applying a clusterization algorithm for Russian-language newswire texts. We narrowed the task to Person proper names clusterization. Our approach model included two steps: mention extraction and clusterization. Mention extraction was proceeded by manually-created grammars for Tomita-parser. For mention grouping, we used agglomerative clusterization on entity level with the help of weighted feature vectors. We run our experiments on newswire texts, annotated for factRuEval-2016 competition, organized by Dialogue Evaluation. We compare our results with competitors. As a baseline, we set built-in Tonuta-parser algorithms for name extraction and name clusterization. We got comparable results and outperformed the baseline.
Original languageEnglish
Title of host publicationProceedings of the International FRUCT Conference on Intelligence, Social Media and Web, ISMW FRUCT 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages9-16
ISBN (Print)9789526839769
DOIs
Publication statusPublished - 2016

Fingerprint Dive into the research topics of 'Coreference resolution using clusterization'. Together they form a unique fingerprint.

Cite this