© 2016 FRUCT.This work deseribes the experience of ereating a corefarence resolution system for Russian language. Coreference resolution is a key subtask of Information Extraction, and aims to grouping mentions that refer to the same discourse entity. This work was aimed to applying a clusterization algorithm for Russian-language newswire texts. We narrowed the task to Person proper names clusterization. Our approach model included two steps: mention extraction and clusterization. Mention extraction was proceeded by manually-created grammars for Tomita-parser. For mention grouping, we used agglomerative clusterization on entity level with the help of weighted feature vectors. We run our experiments on newswire texts, annotated for factRuEval-2016 competition, organized by Dialogue Evaluation. We compare our results with competitors. As a baseline, we set built-in Tonuta-parser algorithms for name extraction and name clusterization. We got comparable results and outperformed the baseline.
Original languageEnglish
Title of host publicationProceedings of the International FRUCT Conference on Intelligence, Social Media and Web, ISMW FRUCT 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages9-16
ISBN (Print)978-952-68397-6-9
DOIs
StatePublished - 2016
Event2016 International FRUCT Conference on Intelligence, Social Media and Web - Saint-Petersburg, Russian Federation
Duration: 28 Aug 20164 Sep 2016

Conference

Conference2016 International FRUCT Conference on Intelligence, Social Media and Web
Abbreviated titleISMW FRUCT 2016
Country/TerritoryRussian Federation
CitySaint-Petersburg
Period28/08/164/09/16

ID: 7966426