The paper describes our approach to the task of information extraction within
FactRuEval, an independent evaluation of Named Entity Recognition and Fact
Extraction tools. We took part in the three subtasks of the evaluation: Named
Entity Recognition per se, Entity Normalization and Fact Extraction.
We chose a rule-based approach to the task. The three subtasks correspond to the modules of ‘Hurma’ parser, the tool we have developed. In addition to traditional lexicon and regular expressions based rules, it allows
creating elaborate rules to mine and normalize different kinds of entities
with regard to specific challenges such language as Russian presents to the
researchers. For Fact Extraction, we used skip-gram based algorithm with
no dependencies in order to overcome the problem of data sparsity.
Preliminary results show that our Entity Extraction and Normalization methods score reasonably high and our Fact Extraction score is high
enough, taken into account that that our expected maximum F-measure
is relatively low due to the specifics of the Gold Standard.
Original languageEnglish
Number of pages11
StatePublished - 2016
Event22-я Международная научная конференция "Диалог" - Москва, Russian Federation
Duration: 1 Jun 20164 Jun 2016

Conference

Conference22-я Международная научная конференция "Диалог"
Country/TerritoryRussian Federation
CityМосква
Period1/06/164/06/16

    Research areas

  • Information Extraction, Named Entity Recognition, Named EntityNormalization, Fact Extraction, skip-grams

ID: 106951610