Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Hybrid Materialization in a Disk-Based Column-Store. / Klyuchikov, Evgeniy; Chizhov, Anton; Polyntsov, Michael; Chernishev, George; Mikhailova, Elena.
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD). Association for Computing Machinery, 2024. стр. 164-172 (ACM International Conference Proceeding Series).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Hybrid Materialization in a Disk-Based Column-Store.
AU - Klyuchikov, Evgeniy
AU - Chizhov, Anton
AU - Polyntsov, Michael
AU - Chernishev, George
AU - Mikhailova, Elena
N1 - DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.
PY - 2024/1/4
Y1 - 2024/1/4
N2 - In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query plans, and, therefore, impacts overall system performance. In this paper, we continue investigating materialization strategies for a distributed disk-based column-store. We start by demonstrating cases of existing approaches fundamentally limiting resulting system performance. In order to address them, we propose a new model of hybrid materialization. The main feature of hybrid materialization is the ability to manipulate both positions and values at the same time. This way, the query engine can efficiently combine advantages of all the existing strategies and support a new class of query plans. Moreover, hybrid materialization enables the query engine to flexibly customize the materialization policy of individual attributes. We describe our vision of how hybrid materialization can be implemented in a columnar system. As an example, we use PosDB - a distributed, disk-based column-store. We present necessary data structures, the internals of a hybrid operator, and describe the algebra of such operators. Based on this implementation, we evaluate performance of late, ultra-late, and hybrid materialization strategies in several scenarios based on TPC-H queries. Our experiments demonstrate that hybrid materialization is almost two times faster than its counterparts, while providing a more flexible query model.
AB - In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query plans, and, therefore, impacts overall system performance. In this paper, we continue investigating materialization strategies for a distributed disk-based column-store. We start by demonstrating cases of existing approaches fundamentally limiting resulting system performance. In order to address them, we propose a new model of hybrid materialization. The main feature of hybrid materialization is the ability to manipulate both positions and values at the same time. This way, the query engine can efficiently combine advantages of all the existing strategies and support a new class of query plans. Moreover, hybrid materialization enables the query engine to flexibly customize the materialization policy of individual attributes. We describe our vision of how hybrid materialization can be implemented in a columnar system. As an example, we use PosDB - a distributed, disk-based column-store. We present necessary data structures, the internals of a hybrid operator, and describe the algebra of such operators. Based on this implementation, we evaluate performance of late, ultra-late, and hybrid materialization strategies in several scenarios based on TPC-H queries. Our experiments demonstrate that hybrid materialization is almost two times faster than its counterparts, while providing a more flexible query model.
KW - Analytic workloads
KW - Column-stores
KW - Databases
KW - Hybrid materialization
KW - Late Materialization
KW - Query engine
KW - Query processing
UR - https://www.mendeley.com/catalogue/a671c5fd-8c95-3194-a1be-ef2311c3bdb7/
U2 - 10.1145/3632410.3632422
DO - 10.1145/3632410.3632422
M3 - Conference contribution
SN - 9798400716348
T3 - ACM International Conference Proceeding Series
SP - 164
EP - 172
BT - CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
PB - Association for Computing Machinery
T2 - CODS-COMAD 2024: 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
Y2 - 4 January 2024 through 7 January 2024
ER -
ID: 116480178