Аннотация
© 2016 FRUCT.This work deseribes the experience of ereating a corefarence resolution system for Russian language. Coreference resolution is a key subtask of Information Extraction, and aims to grouping mentions that refer to the same discourse entity. This work was aimed to applying a clusterization algorithm for Russian-language newswire texts. We narrowed the task to Person proper names clusterization. Our approach model included two steps: mention extraction and clusterization. Mention extraction was proceeded by manually-created grammars for Tomita-parser. For mention grouping, we used agglomerative clusterization on entity level with the help of weighted feature vectors. We run our experiments on newswire texts, annotated for factRuEval-2016 competition, organized by Dialogue Evaluation. We compare our results with competitors. As a baseline, we set built-in Tonuta-parser algorithms for name extraction and name clusterization. We got comparable results and outperformed the baseline.
Язык оригинала | английский |
---|---|
Название основной публикации | Proceedings of the International FRUCT Conference on Intelligence, Social Media and Web, ISMW FRUCT 2016 |
Издатель | Institute of Electrical and Electronics Engineers Inc. |
Страницы | 9-16 |
ISBN (печатное издание) | 9789526839769 |
DOI | |
Состояние | Опубликовано - 2016 |