Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Comparison of different approaches for hotels deduplication. / Kozhevnikov, Ivan; Gorovoy, Vladimir.
Knowledge Engineering and Semantic Web - 7th International Conference, KESW 2016, Proceedings. ed. / Axel-Cyrille Ngonga Ngomo; Petr Křemen. Springer Nature, 2016. p. 230-240 (Communications in Computer and Information Science; Vol. 649).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Comparison of different approaches for hotels deduplication
AU - Kozhevnikov, Ivan
AU - Gorovoy, Vladimir
N1 - Kozhevnikov, I. Comparison of different approaches for hotels deduplication / I. Kozhevnikov, V. Gorovoy // Knowledge Engineering and Semantic Web - 7th International Conference, KESW 2016, Proceedings. - Springer Nature, 2016. - P. 230-240. Publisher Copyright:© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - The present article addresses the problem of a hotel deduplication. Obvious approaches, such as name or location comparisons, fail, because hotel descriptions differ among different databases. The most accurate approach to solve this problem is to use the professionally trained content managers, but it is expensive, hence an automatic solution should be implemented. We propose a method to improve a hypothesis that a pair of hotels is identical, and compare its performance with alternative solutions. The proposed method satisfies business requirements set for the precision and recall of the hotel deduplication task. The method is based on machine learning approach with the use of some unique features, including those built with the help of computer vision algorithms.
AB - The present article addresses the problem of a hotel deduplication. Obvious approaches, such as name or location comparisons, fail, because hotel descriptions differ among different databases. The most accurate approach to solve this problem is to use the professionally trained content managers, but it is expensive, hence an automatic solution should be implemented. We propose a method to improve a hypothesis that a pair of hotels is identical, and compare its performance with alternative solutions. The proposed method satisfies business requirements set for the precision and recall of the hotel deduplication task. The method is based on machine learning approach with the use of some unique features, including those built with the help of computer vision algorithms.
KW - Deduplication
KW - Entity resolution
KW - Machine learning
KW - Natural language processing
KW - SCOPUS
UR - http://www.scopus.com/inward/record.url?scp=84988660373&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-45880-9_18
DO - 10.1007/978-3-319-45880-9_18
M3 - Conference contribution
AN - SCOPUS:84988660373
SN - 9783319458793
T3 - Communications in Computer and Information Science
SP - 230
EP - 240
BT - Knowledge Engineering and Semantic Web - 7th International Conference, KESW 2016, Proceedings
A2 - Ngomo, Axel-Cyrille Ngonga
A2 - Křemen, Petr
PB - Springer Nature
T2 - 7th International Conference on Knowledge Engineering and Semantic Web, KESW 2016
Y2 - 21 September 2016 through 23 September 2016
ER -
ID: 86415654