Результаты исследований: Научные публикации в периодических изданиях › статья в журнале по материалам конференции › Рецензирование
A Comprehensive Study of Late Materialization Strategies for a Disk-Based Column-Store. / Chernishev, George; Galaktionov, Viacheslav; Grigorev, Valentin; Klyuchikov, Evgeniy; Smirnov, Kirill.
в: CEUR Workshop Proceedings, Том 3130, 2022, стр. 21-30.Результаты исследований: Научные публикации в периодических изданиях › статья в журнале по материалам конференции › Рецензирование
}
TY - JOUR
T1 - A Comprehensive Study of Late Materialization Strategies for a Disk-Based Column-Store
AU - Chernishev, George
AU - Galaktionov, Viacheslav
AU - Grigorev, Valentin
AU - Klyuchikov, Evgeniy
AU - Smirnov, Kirill
N1 - Publisher Copyright: © Copyright 2022 for this paper by its author(s). Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
PY - 2022
Y1 - 2022
N2 - By allowing operations on positions (row IDs, offsets), column-stores increase the overall number of admissible query plans. Their plans can be classified into a number of so-called materialization strategies, which describe the moment when positions are switched to tuples. Despite being a well-studied topic with several different implementations, there is still no formal definition for it, as well as no classification of existing approaches. In this paper we review and classify these approaches. Our classification shows that, for disk-based systems, none of the existing implementation variants efficiently combines position manipulation inside both selections and joins. For this reason, we propose such an approach which we name “ultra-late materialization”. Further, we describe recent modifications of PosDB - a distributed, disk-based column-store. These modifications allowed us to implement a flexible query processing model. Relying on it, we have implemented a number of late materialization variants, including our approach. Finally, we empirically evaluate the performance of ultra-late materialization and classic strategies. We also compare it with two industrial-grade disk-based systems: PostgreSQL and MariaDB Column Store. Experiments demonstrate that our variant of late materialization outperforms the closest competitor (MariaDB Column Store) by 50% which makes further investigation worthwhile.
AB - By allowing operations on positions (row IDs, offsets), column-stores increase the overall number of admissible query plans. Their plans can be classified into a number of so-called materialization strategies, which describe the moment when positions are switched to tuples. Despite being a well-studied topic with several different implementations, there is still no formal definition for it, as well as no classification of existing approaches. In this paper we review and classify these approaches. Our classification shows that, for disk-based systems, none of the existing implementation variants efficiently combines position manipulation inside both selections and joins. For this reason, we propose such an approach which we name “ultra-late materialization”. Further, we describe recent modifications of PosDB - a distributed, disk-based column-store. These modifications allowed us to implement a flexible query processing model. Relying on it, we have implemented a number of late materialization variants, including our approach. Finally, we empirically evaluate the performance of ultra-late materialization and classic strategies. We also compare it with two industrial-grade disk-based systems: PostgreSQL and MariaDB Column Store. Experiments demonstrate that our variant of late materialization outperforms the closest competitor (MariaDB Column Store) by 50% which makes further investigation worthwhile.
UR - http://www.scopus.com/inward/record.url?scp=85129163477&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85129163477
VL - 3130
SP - 21
EP - 30
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
SN - 1613-0073
T2 - 24th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2022
Y2 - 29 March 2022
ER -
ID: 98338361