The present article addresses the problem of a hotel deduplication. Obvious approaches, such as name or location comparisons, fail, because hotel descriptions differ among different databases. The most accurate approach to solve this problem is to use the professionally trained content managers, but it is expensive, hence an automatic solution should be implemented. We propose a method to improve a hypothesis that a pair of hotels is identical, and compare its performance with alternative solutions. The proposed method satisfies business requirements set for the precision and recall of the hotel deduplication task. The method is based on machine learning approach with the use of some unique features, including those built with the help of computer vision algorithms.

Язык оригиналаанглийский
Название основной публикацииKnowledge Engineering and Semantic Web - 7th International Conference, KESW 2016, Proceedings
РедакторыAxel-Cyrille Ngonga Ngomo, Petr Křemen
ИздательSpringer Nature
Число страниц11
ISBN (печатное издание)9783319458793
СостояниеОпубликовано - 2016
Событие7th International Conference on Knowledge Engineering and Semantic Web, KESW 2016 - Prague, Чехия
Продолжительность: 21 сен 201623 сен 2016

Серия публикаций

НазваниеCommunications in Computer and Information Science
ISSN (печатное издание)1865-0929


конференция7th International Conference on Knowledge Engineering and Semantic Web, KESW 2016

    Области исследований


    Предметные области Scopus

  • Компьютерные науки (все)
  • Математика (все)

ID: 86415654