The paper describes our approach to the task of information extraction within
FactRuEval, an independent evaluation of Named Entity Recognition and Fact
Extraction tools. We took part in the three subtasks of the evaluation: Named
Entity Recognition per se, Entity Normalization and Fact Extraction.
We chose a rule-based approach to the task. The three subtasks correspond to the modules of ‘Hurma’ parser, the tool we have developed. In addition to traditional lexicon and regular expressions based rules, it allows
creating elaborate rules to mine and normalize different kinds of entities
with regard to specific challenges such language as Russian presents to the
researchers. For Fact Extraction, we used skip-gram based algorithm with
no dependencies in order to overcome the problem of data sparsity.
Preliminary results show that our Entity Extraction and Normalization methods score reasonably high and our Fact Extraction score is high
enough, taken into account that that our expected maximum F-measure
is relatively low due to the specifics of the Gold Standard.
Язык оригиналаанглийский
Число страниц11
СостояниеОпубликовано - 2016
Событие22-я Международная научная конференция "Диалог" - Москва, Российская Федерация
Продолжительность: 1 июн 20164 июн 2016

конференция

конференция22-я Международная научная конференция "Диалог"
Страна/TерриторияРоссийская Федерация
ГородМосква
Период1/06/164/06/16

ID: 106951610