Standard

A Comprehensive Study of Late Materialization Strategies for a Disk-Based Column-Store. / Chernishev, George; Galaktionov, Viacheslav; Grigorev, Valentin; Klyuchikov, Evgeniy; Smirnov, Kirill.

в: CEUR Workshop Proceedings, Том 3130, 2022, стр. 21-30.

Результаты исследований: Научные публикации в периодических изданияхстатья в журнале по материалам конференцииРецензирование

Harvard

APA

Vancouver

Author

BibTeX

@article{16adadb9ae31473a9c6cf16ba43ac33b,
title = "A Comprehensive Study of Late Materialization Strategies for a Disk-Based Column-Store",
abstract = "By allowing operations on positions (row IDs, offsets), column-stores increase the overall number of admissible query plans. Their plans can be classified into a number of so-called materialization strategies, which describe the moment when positions are switched to tuples. Despite being a well-studied topic with several different implementations, there is still no formal definition for it, as well as no classification of existing approaches. In this paper we review and classify these approaches. Our classification shows that, for disk-based systems, none of the existing implementation variants efficiently combines position manipulation inside both selections and joins. For this reason, we propose such an approach which we name “ultra-late materialization”. Further, we describe recent modifications of PosDB - a distributed, disk-based column-store. These modifications allowed us to implement a flexible query processing model. Relying on it, we have implemented a number of late materialization variants, including our approach. Finally, we empirically evaluate the performance of ultra-late materialization and classic strategies. We also compare it with two industrial-grade disk-based systems: PostgreSQL and MariaDB Column Store. Experiments demonstrate that our variant of late materialization outperforms the closest competitor (MariaDB Column Store) by 50% which makes further investigation worthwhile.",
author = "George Chernishev and Viacheslav Galaktionov and Valentin Grigorev and Evgeniy Klyuchikov and Kirill Smirnov",
note = "Publisher Copyright: {\textcopyright} Copyright 2022 for this paper by its author(s). Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0); 24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2022 ; Conference date: 29-03-2022",
year = "2022",
language = "English",
volume = "3130",
pages = "21--30",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "RWTH Aahen University",

}

RIS

TY - JOUR

T1 - A Comprehensive Study of Late Materialization Strategies for a Disk-Based Column-Store

AU - Chernishev, George

AU - Galaktionov, Viacheslav

AU - Grigorev, Valentin

AU - Klyuchikov, Evgeniy

AU - Smirnov, Kirill

N1 - Publisher Copyright: © Copyright 2022 for this paper by its author(s). Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)

PY - 2022

Y1 - 2022

N2 - By allowing operations on positions (row IDs, offsets), column-stores increase the overall number of admissible query plans. Their plans can be classified into a number of so-called materialization strategies, which describe the moment when positions are switched to tuples. Despite being a well-studied topic with several different implementations, there is still no formal definition for it, as well as no classification of existing approaches. In this paper we review and classify these approaches. Our classification shows that, for disk-based systems, none of the existing implementation variants efficiently combines position manipulation inside both selections and joins. For this reason, we propose such an approach which we name “ultra-late materialization”. Further, we describe recent modifications of PosDB - a distributed, disk-based column-store. These modifications allowed us to implement a flexible query processing model. Relying on it, we have implemented a number of late materialization variants, including our approach. Finally, we empirically evaluate the performance of ultra-late materialization and classic strategies. We also compare it with two industrial-grade disk-based systems: PostgreSQL and MariaDB Column Store. Experiments demonstrate that our variant of late materialization outperforms the closest competitor (MariaDB Column Store) by 50% which makes further investigation worthwhile.

AB - By allowing operations on positions (row IDs, offsets), column-stores increase the overall number of admissible query plans. Their plans can be classified into a number of so-called materialization strategies, which describe the moment when positions are switched to tuples. Despite being a well-studied topic with several different implementations, there is still no formal definition for it, as well as no classification of existing approaches. In this paper we review and classify these approaches. Our classification shows that, for disk-based systems, none of the existing implementation variants efficiently combines position manipulation inside both selections and joins. For this reason, we propose such an approach which we name “ultra-late materialization”. Further, we describe recent modifications of PosDB - a distributed, disk-based column-store. These modifications allowed us to implement a flexible query processing model. Relying on it, we have implemented a number of late materialization variants, including our approach. Finally, we empirically evaluate the performance of ultra-late materialization and classic strategies. We also compare it with two industrial-grade disk-based systems: PostgreSQL and MariaDB Column Store. Experiments demonstrate that our variant of late materialization outperforms the closest competitor (MariaDB Column Store) by 50% which makes further investigation worthwhile.

UR - http://www.scopus.com/inward/record.url?scp=85129163477&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85129163477

VL - 3130

SP - 21

EP - 30

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

T2 - 24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2022

Y2 - 29 March 2022

ER -

ID: 98338361