By allowing operations on positions (row IDs, offsets), column-stores increase the overall number of admissible query plans. Their plans can be classified into a number of so-called materialization strategies, which describe the moment when positions are switched to tuples. Despite being a well-studied topic with several different implementations, there is still no formal definition for it, as well as no classification of existing approaches. In this paper we review and classify these approaches. Our classification shows that, for disk-based systems, none of the existing implementation variants efficiently combines position manipulation inside both selections and joins. For this reason, we propose such an approach which we name “ultra-late materialization”. Further, we describe recent modifications of PosDB - a distributed, disk-based column-store. These modifications allowed us to implement a flexible query processing model. Relying on it, we have implemented a number of late materialization variants, including our approach. Finally, we empirically evaluate the performance of ultra-late materialization and classic strategies. We also compare it with two industrial-grade disk-based systems: PostgreSQL and MariaDB Column Store. Experiments demonstrate that our variant of late materialization outperforms the closest competitor (MariaDB Column Store) by 50% which makes further investigation worthwhile.

Original languageEnglish
Pages (from-to)21-30
Number of pages10
JournalCEUR Workshop Proceedings
Volume3130
StatePublished - 2022
Event24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2022 - Edinburgh, United Kingdom
Duration: 29 Mar 2022 → …

    Scopus subject areas

  • Computer Science(all)

ID: 98338361