Comparison of different approaches for hotels deduplication

DOI

https://doi.org/10.1007/978-3-319-45880-9_18
Final published version

Ivan Kozhevnikov
Vladimir Gorovoy

The present article addresses the problem of a hotel deduplication. Obvious approaches, such as name or location comparisons, fail, because hotel descriptions differ among different databases. The most accurate approach to solve this problem is to use the professionally trained content managers, but it is expensive, hence an automatic solution should be implemented. We propose a method to improve a hypothesis that a pair of hotels is identical, and compare its performance with alternative solutions. The proposed method satisfies business requirements set for the precision and recall of the hotel deduplication task. The method is based on machine learning approach with the use of some unique features, including those built with the help of computer vision algorithms.

Original language	English
Title of host publication	Knowledge Engineering and Semantic Web - 7th International Conference, KESW 2016, Proceedings
Editors	Axel-Cyrille Ngonga Ngomo, Petr Křemen
Publisher	Springer Nature
Pages	230-240
Number of pages	11
ISBN (Print)	9783319458793
DOIs	https://doi.org/10.1007/978-3-319-45880-9_18
State	Published - 2016
Event	7th International Conference on Knowledge Engineering and Semantic Web, KESW 2016 - Prague, Czech Republic Duration: 21 Sep 2016 → 23 Sep 2016

Publication series

Name	Communications in Computer and Information Science
Volume	649
ISSN (Print)	1865-0929

Conference

Conference	7th International Conference on Knowledge Engineering and Semantic Web, KESW 2016
Country/Territory	Czech Republic
City	Prague
Period	21/09/16 → 23/09/16

Research areas

Deduplication, Entity resolution, Machine learning, Natural language processing

Scopus subject areas

Computer Science(all)
Mathematics(all)

ID: 86415654