Coreference resolution using clusterization

Результат исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучная


© 2016 FRUCT.This work deseribes the experience of ereating a corefarence resolution system for Russian language. Coreference resolution is a key subtask of Information Extraction, and aims to grouping mentions that refer to the same discourse entity. This work was aimed to applying a clusterization algorithm for Russian-language newswire texts. We narrowed the task to Person proper names clusterization. Our approach model included two steps: mention extraction and clusterization. Mention extraction was proceeded by manually-created grammars for Tomita-parser. For mention grouping, we used agglomerative clusterization on entity level with the help of weighted feature vectors. We run our experiments on newswire texts, annotated for factRuEval-2016 competition, organized by Dialogue Evaluation. We compare our results with competitors. As a baseline, we set built-in Tonuta-parser algorithms for name extraction and name clusterization. We got comparable results and outperformed the baseline.
Язык оригиналаанглийский
Название основной публикацииProceedings of the International FRUCT Conference on Intelligence, Social Media and Web, ISMW FRUCT 2016
ИздательInstitute of Electrical and Electronics Engineers Inc.
ISBN (печатное издание)9789526839769
СостояниеОпубликовано - 2016

Fingerprint Подробные сведения о темах исследования «Coreference resolution using clusterization». Вместе они формируют уникальный семантический отпечаток (fingerprint).