In this paper we continue our efforts to evaluate matrix clustering algorithms. In our previous study we presented a test environment and results of preliminary experiments with the “separate” strategy for vertical partitioning. This strategy assigns a separate vertical partition for every cluster found by the algorithm, including inter-submatrix attribute group. In this paper we introduce two other strategies: the “replicate” strategy, which replicates inter-submatrix attributes to every cluster and the “retain” strategy, which assigns inter-submatrix attributes to their original clusters. We experimentally evaluate all strategies in a disk-based environment using the standard TPC-H workload and the PostgreSQL DBMS. We start with the study of record reconstruction methods in the PostgreSQL DBMS. Then, we apply partitioning strategies to three matrix clustering algorithms and evaluate both query performance and storage overhead of the resulting partitions. Finally, we compare the resulting partitioning schemes with the ideal partitioning scenario.

Original languageEnglish
Title of host publicationData Analytics and Management in Data Intensive Domains - XVIII International Conference, DAMDID/RCDL 2016, Revised Selected Papers
EditorsYannis Manolopoulos, Leonid Kalinichenko, Sergei O. Kuznetsov
PublisherSpringer Nature
Pages163-177
Number of pages15
ISBN (Print)9783319571348
DOIs
StatePublished - 2017
Event18th International Conference on Data Analytics and Management in Data-Intensive Domains, DAMDID 2016 - Ershovo, Russian Federation
Duration: 11 Oct 201614 Oct 2016

Publication series

NameCommunications in Computer and Information Science
Volume706
ISSN (Print)1865-0929

Conference

Conference18th International Conference on Data Analytics and Management in Data-Intensive Domains, DAMDID 2016
Country/TerritoryRussian Federation
CityErshovo
Period11/10/1614/10/16

    Research areas

  • Database tuning, Experimentation, Fragmentation, Matrix clustering, PostgreSQL, TPC-H, Vertical partitioning

    Scopus subject areas

  • Computer Science(all)
  • Mathematics(all)

ID: 72709067