СИСТЕМЫ ОБРАБОТКИ ЕСТЕСТВЕННОГО ЯЗЫКА ДЛЯ ИЗВЛЕЧЕНИЯ ДАННЫХ И КАРТОГРАФИРОВАНИЯ НА ОСНОВЕ НЕСТРУКТУРИРОВАННЫХ БЛОКОВ ТЕКСТА

Alexey A. Kolesnikov, Pavel M. Kikin, Giovanni Niko, Elena V. Komissarova

Результат исследований: Научные публикации в периодических изданияхстатья в журнале по материалам конференциирецензирование

1 Цитирования (Scopus)

Аннотация

Modern natural language processing technologies allow you to work with texts without being a specialist in linguistics. The use of popular data processing platforms for the development and use of linguistic models provides an opportunity to implement them in popular geographic information systems. This feature allows you to significantly expand the functionality and improve the accuracy of standard geocoding functions. The article provides a comparison of the most popular methods and software implemented on their basis, using the example of solving the problem of extracting geographical names from plain text. This option is an extended version of the geocoding operation, since the result also includes the coordinates of the point features of interest, but there is no need to separately extract the addresses or geographical names of the objects in advance from the text. In computer linguistics, this problem is solved by the methods of extracting named entities (Eng. named entity recognition). Among the most modern approaches to the final implementation, the authors of the article have chosen algorithms based on rules, models of maximum entropy and convolutional neural networks. The selected algorithms and methods were evaluated not only from the point of view of the accuracy of searching for geographical objects in the text, but also from the point of view of simplicity of refinement of the basic rules or mathematical models using their own text bodies. Reports on technological violations, accidents and incidents at the facilities of the heat and power complex of the Ministry of Energy of the Russian Federation were selected as the initial data for testing the abovementioned methods and software solutions. Also, a study is presented on a method for improving the quality of recognition of named entities based on additional training of a neural network model using a specialized text corpus.

Переведенное названиеNatural language processing systems for data extraction and mapping on the basis of unstructured text blocks
Язык оригиналарусский
Страницы (с-по)375-384
Число страниц10
ЖурналInterCarto, InterGIS
Том26
DOI
СостояниеОпубликовано - 2020
Событие2020 International Conference on GI Support of Sustainable Development of Territories - Moscow, Российская Федерация
Продолжительность: 28 сен 202029 сен 2020

Предметные области Scopus

  • Компьютерные технологии в науках о земле
  • Процессы поверхности земли
  • Геофизика
  • География, планирование и развитие

Ключевые слова

  • DeepPavlov
  • Geographical name
  • Named entity recognition
  • Natural language processing
  • SpaCy

Fingerprint

Подробные сведения о темах исследования «СИСТЕМЫ ОБРАБОТКИ ЕСТЕСТВЕННОГО ЯЗЫКА ДЛЯ ИЗВЛЕЧЕНИЯ ДАННЫХ И КАРТОГРАФИРОВАНИЯ НА ОСНОВЕ НЕСТРУКТУРИРОВАННЫХ БЛОКОВ ТЕКСТА». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать