Working with real data often involves clustering of mixed-type variable datasets, which is followed by certain difficulties resulting from lack of guidelines on the usage of clustering methods for mixed-type data. Instead of choosing the optimal method for mixed data clustering, this study aims to determine on which conditions do the specialized cluster analysis methods - hierarchical clustering using Gower's distance and k-prototypes algorithm, produce similar partitioning results. Resulting clusterings were after compared with clusters attained by the usage of one-hot encoding technique for transforming categorical data into numerical values. To determine the degree of similarity between methods special metrics were used. Data used in this study is real data obtained from real estate sales advertisements in Saint Petersburg.
Original languageEnglish
Title of host publicationProceedings - 2025 International Russian Smart Industry Conference, SmartIndustryCon 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages411-417
Number of pages7
ISBN (Print)9798331511241
DOIs
StatePublished - 24 Mar 2025
EventМеждународная научно-практическая конференция "Индустрия 4.0" - , Russian Federation
Duration: 23 Mar 202529 Mar 2025
https://smartindustrycon.ru/smartindustrycon2025-rus.html

Conference

ConferenceМеждународная научно-практическая конференция "Индустрия 4.0"
Country/TerritoryRussian Federation
Period23/03/2529/03/25
Internet address

    Research areas

  • Gower's distance, distance-based cluster analysis, prototypes

ID: 137006687