СИСТЕМЫ ОБРАБОТКИ ЕСТЕСТВЕННОГО ЯЗЫКА ДЛЯ ИЗВЛЕЧЕНИЯ ДАННЫХ И КАРТОГРАФИРОВАНИЯ НА ОСНОВЕ НЕСТРУКТУРИРОВАННЫХ БЛОКОВ ТЕКСТА

Translated title of the contribution: Natural language processing systems for data extraction and mapping on the basis of unstructured text blocks

Alexey A. Kolesnikov, Pavel M. Kikin, Giovanni Niko, Elena V. Komissarova

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Modern natural language processing technologies allow you to work with texts without being a specialist in linguistics. The use of popular data processing platforms for the development and use of linguistic models provides an opportunity to implement them in popular geographic information systems. This feature allows you to significantly expand the functionality and improve the accuracy of standard geocoding functions. The article provides a comparison of the most popular methods and software implemented on their basis, using the example of solving the problem of extracting geographical names from plain text. This option is an extended version of the geocoding operation, since the result also includes the coordinates of the point features of interest, but there is no need to separately extract the addresses or geographical names of the objects in advance from the text. In computer linguistics, this problem is solved by the methods of extracting named entities (Eng. named entity recognition). Among the most modern approaches to the final implementation, the authors of the article have chosen algorithms based on rules, models of maximum entropy and convolutional neural networks. The selected algorithms and methods were evaluated not only from the point of view of the accuracy of searching for geographical objects in the text, but also from the point of view of simplicity of refinement of the basic rules or mathematical models using their own text bodies. Reports on technological violations, accidents and incidents at the facilities of the heat and power complex of the Ministry of Energy of the Russian Federation were selected as the initial data for testing the abovementioned methods and software solutions. Also, a study is presented on a method for improving the quality of recognition of named entities based on additional training of a neural network model using a specialized text corpus.

Translated title of the contributionNatural language processing systems for data extraction and mapping on the basis of unstructured text blocks
Original languageRussian
Pages (from-to)375-384
Number of pages10
JournalInterCarto, InterGIS
Volume26
DOIs
StatePublished - 2020
Event2020 International Conference on GI Support of Sustainable Development of Territories - Moscow, Russian Federation
Duration: 28 Sep 202029 Sep 2020

Scopus subject areas

  • Computers in Earth Sciences
  • Earth-Surface Processes
  • Geophysics
  • Geography, Planning and Development

Fingerprint

Dive into the research topics of 'Natural language processing systems for data extraction and mapping on the basis of unstructured text blocks'. Together they form a unique fingerprint.

Cite this