The present article addresses the problem of a hotel deduplication. Obvious approaches, such as name or location comparisons, fail, because hotel descriptions differ among different databases. The most accurate approach to solve this problem is to use the professionally trained content managers, but it is expensive, hence an automatic solution should be implemented. We propose a method to improve a hypothesis that a pair of hotels is identical, and compare its performance with alternative solutions. The proposed method satisfies business requirements set for the precision and recall of the hotel deduplication task. The method is based on machine learning approach with the use of some unique features, including those built with the help of computer vision algorithms.

Original languageEnglish
Title of host publicationKnowledge Engineering and Semantic Web - 7th International Conference, KESW 2016, Proceedings
EditorsAxel-Cyrille Ngonga Ngomo, Petr Křemen
PublisherSpringer Nature
Pages230-240
Number of pages11
ISBN (Print)9783319458793
DOIs
StatePublished - 2016
Event7th International Conference on Knowledge Engineering and Semantic Web, KESW 2016 - Prague, Czech Republic
Duration: 21 Sep 201623 Sep 2016

Publication series

NameCommunications in Computer and Information Science
Volume649
ISSN (Print)1865-0929

Conference

Conference7th International Conference on Knowledge Engineering and Semantic Web, KESW 2016
Country/TerritoryCzech Republic
CityPrague
Period21/09/1623/09/16

    Research areas

  • Deduplication, Entity resolution, Machine learning, Natural language processing

    Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

ID: 86415654