Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
A study of several matrix-clustering vertical partitioning algorithms in a disk-based environment. / Galaktionov, Viacheslav; Chernishev, George; Smirnov, Kirill; Novikov, Boris; Grigoriev, Dmitry A.
Data Analytics and Management in Data Intensive Domains - XVIII International Conference, DAMDID/RCDL 2016, Revised Selected Papers. ed. / Yannis Manolopoulos; Leonid Kalinichenko; Sergei O. Kuznetsov. Springer Nature, 2017. p. 163-177 (Communications in Computer and Information Science; Vol. 706).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
}
TY - GEN
T1 - A study of several matrix-clustering vertical partitioning algorithms in a disk-based environment
AU - Galaktionov, Viacheslav
AU - Chernishev, George
AU - Smirnov, Kirill
AU - Novikov, Boris
AU - Grigoriev, Dmitry A.
N1 - Publisher Copyright: © Springer International Publishing AG 2017. Copyright: Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2017
Y1 - 2017
N2 - In this paper we continue our efforts to evaluate matrix clustering algorithms. In our previous study we presented a test environment and results of preliminary experiments with the “separate” strategy for vertical partitioning. This strategy assigns a separate vertical partition for every cluster found by the algorithm, including inter-submatrix attribute group. In this paper we introduce two other strategies: the “replicate” strategy, which replicates inter-submatrix attributes to every cluster and the “retain” strategy, which assigns inter-submatrix attributes to their original clusters. We experimentally evaluate all strategies in a disk-based environment using the standard TPC-H workload and the PostgreSQL DBMS. We start with the study of record reconstruction methods in the PostgreSQL DBMS. Then, we apply partitioning strategies to three matrix clustering algorithms and evaluate both query performance and storage overhead of the resulting partitions. Finally, we compare the resulting partitioning schemes with the ideal partitioning scenario.
AB - In this paper we continue our efforts to evaluate matrix clustering algorithms. In our previous study we presented a test environment and results of preliminary experiments with the “separate” strategy for vertical partitioning. This strategy assigns a separate vertical partition for every cluster found by the algorithm, including inter-submatrix attribute group. In this paper we introduce two other strategies: the “replicate” strategy, which replicates inter-submatrix attributes to every cluster and the “retain” strategy, which assigns inter-submatrix attributes to their original clusters. We experimentally evaluate all strategies in a disk-based environment using the standard TPC-H workload and the PostgreSQL DBMS. We start with the study of record reconstruction methods in the PostgreSQL DBMS. Then, we apply partitioning strategies to three matrix clustering algorithms and evaluate both query performance and storage overhead of the resulting partitions. Finally, we compare the resulting partitioning schemes with the ideal partitioning scenario.
KW - Database tuning
KW - Experimentation
KW - Fragmentation
KW - Matrix clustering
KW - PostgreSQL
KW - TPC-H
KW - Vertical partitioning
UR - http://www.scopus.com/inward/record.url?scp=85018671430&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-57135-5_12
DO - 10.1007/978-3-319-57135-5_12
M3 - Conference contribution
AN - SCOPUS:85018671430
SN - 9783319571348
T3 - Communications in Computer and Information Science
SP - 163
EP - 177
BT - Data Analytics and Management in Data Intensive Domains - XVIII International Conference, DAMDID/RCDL 2016, Revised Selected Papers
A2 - Manolopoulos, Yannis
A2 - Kalinichenko, Leonid
A2 - Kuznetsov, Sergei O.
PB - Springer Nature
T2 - 18th International Conference on Data Analytics and Management in Data-Intensive Domains, DAMDID 2016
Y2 - 11 October 2016 through 14 October 2016
ER -
ID: 72709067