Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Using a Decision Tree for the Clustering Problem. / Гадасина, Людмила Викторовна; Романов, Дмитрий Вячеславович.
Proceedings - 2024 International Russian Smart Industry Conference, SmartIndustryCon 2024. Institute of Electrical and Electronics Engineers Inc., 2024. p. 273-279.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Using a Decision Tree for the Clustering Problem
AU - Гадасина, Людмила Викторовна
AU - Романов, Дмитрий Вячеславович
N1 - Conference code: 4
PY - 2024/3/25
Y1 - 2024/3/25
N2 - Clustering of datasets with mixed type data: quantitative and categorical is difficult by classical methods of cluster analysis. The study proposes to solve the problem of clustering multivariate data using a decision tree. This method requires setting the number of clusters, limiting the maximum proportion of observations that fall into each cluster, and also requires setting a target variable. The latter requirement can be fulfilled by an expert method or by experimenting with different target variables. The method was tested on the data of advertisements about residential real estate for sale in St. Petersburg. At first, tree clustering method was tested on the dataset with only quantitative data, then on the dataset with mixed types of data: quantitative and categorical. The results were compared with the results of the hierarchical method with different distance metrics. The proposed method does not require data standardization, has a higher speed of operation than hierarchical clustering and shows a clearer interpretation of the clustering results.
AB - Clustering of datasets with mixed type data: quantitative and categorical is difficult by classical methods of cluster analysis. The study proposes to solve the problem of clustering multivariate data using a decision tree. This method requires setting the number of clusters, limiting the maximum proportion of observations that fall into each cluster, and also requires setting a target variable. The latter requirement can be fulfilled by an expert method or by experimenting with different target variables. The method was tested on the data of advertisements about residential real estate for sale in St. Petersburg. At first, tree clustering method was tested on the dataset with only quantitative data, then on the dataset with mixed types of data: quantitative and categorical. The results were compared with the results of the hierarchical method with different distance metrics. The proposed method does not require data standardization, has a higher speed of operation than hierarchical clustering and shows a clearer interpretation of the clustering results.
KW - classification
KW - clustering
KW - decision tree
KW - regression
UR - https://www.mendeley.com/catalogue/8d69ac3e-1ba3-3b2a-839a-ebb45e72897c/
U2 - 10.1109/smartindustrycon61328.2024.10515744
DO - 10.1109/smartindustrycon61328.2024.10515744
M3 - Conference contribution
SN - 9798350395044
SP - 273
EP - 279
BT - Proceedings - 2024 International Russian Smart Industry Conference, SmartIndustryCon 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - Международная научно-практическая конференция "Индустрия 4.0"
Y2 - 24 March 2024 through 30 March 2024
ER -
ID: 121544368