On-the-fly filtering of aggregation results in column-stores

Anastasia Tuchina, Valentin Grigorev, George Chernishev

Research output

Abstract

—Aggregation is a database operation that aims to provide basic analytic capabilities by partitioning source data into several groups and computing some function on values belonging to the same group. Nowadays it is common in databases, and especially in the OLAP domain, which is a primary venue for column-stores. In this paper we propose a novel approach to the design of an aggregation operator inside a column-store system. The core of our approach is an analysis of predicates in the HAVING-clause that allows the runtime pruning of groups. We employ monotonicity and codomain analysis in order to detect groups in which predicates would never be satisfied. Eventually, we aim to save I/O and CPU costs by discarding groups as early as possible. We start by providing a high-level overview of our approach and describe its use-cases. Then, we provide a short introduction into our system and describe a straightforward implementation of an aggregation operator. Next, we provide theoretical foundations for our approach and present an improved algorithm. Finally, we present an experimental validation of our approach inside PosDB — a distributed, disk-based column-store engine that features late materialization and block-oriented processing. Experiments using an SSD drive show that our approach can provide up to 5 times improvement over the naive version.

Original languageEnglish
Pages (from-to)53-60
Number of pages8
JournalCEUR Workshop Proceedings
Volume2135
Publication statusPublished - 1 Jan 2018
Event3rd Conference on Software Engineering and Information Management, SEIM 2018 - Saint Petersburg
Duration: 14 Apr 2018 → …

Fingerprint

Agglomeration
Program processors
Engines
Processing
Costs
Experiments

Scopus subject areas

  • Computer Science(all)

Cite this

Tuchina, Anastasia ; Grigorev, Valentin ; Chernishev, George. / On-the-fly filtering of aggregation results in column-stores. In: CEUR Workshop Proceedings. 2018 ; Vol. 2135. pp. 53-60.
@article{6472d673960c461bbe5f56ea4b29a651,
title = "On-the-fly filtering of aggregation results in column-stores",
abstract = "—Aggregation is a database operation that aims to provide basic analytic capabilities by partitioning source data into several groups and computing some function on values belonging to the same group. Nowadays it is common in databases, and especially in the OLAP domain, which is a primary venue for column-stores. In this paper we propose a novel approach to the design of an aggregation operator inside a column-store system. The core of our approach is an analysis of predicates in the HAVING-clause that allows the runtime pruning of groups. We employ monotonicity and codomain analysis in order to detect groups in which predicates would never be satisfied. Eventually, we aim to save I/O and CPU costs by discarding groups as early as possible. We start by providing a high-level overview of our approach and describe its use-cases. Then, we provide a short introduction into our system and describe a straightforward implementation of an aggregation operator. Next, we provide theoretical foundations for our approach and present an improved algorithm. Finally, we present an experimental validation of our approach inside PosDB — a distributed, disk-based column-store engine that features late materialization and block-oriented processing. Experiments using an SSD drive show that our approach can provide up to 5 times improvement over the naive version.",
author = "Anastasia Tuchina and Valentin Grigorev and George Chernishev",
year = "2018",
month = "1",
day = "1",
language = "English",
volume = "2135",
pages = "53--60",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "RWTH Aahen University",

}

On-the-fly filtering of aggregation results in column-stores. / Tuchina, Anastasia; Grigorev, Valentin; Chernishev, George.

In: CEUR Workshop Proceedings, Vol. 2135, 01.01.2018, p. 53-60.

Research output

TY - JOUR

T1 - On-the-fly filtering of aggregation results in column-stores

AU - Tuchina, Anastasia

AU - Grigorev, Valentin

AU - Chernishev, George

PY - 2018/1/1

Y1 - 2018/1/1

N2 - —Aggregation is a database operation that aims to provide basic analytic capabilities by partitioning source data into several groups and computing some function on values belonging to the same group. Nowadays it is common in databases, and especially in the OLAP domain, which is a primary venue for column-stores. In this paper we propose a novel approach to the design of an aggregation operator inside a column-store system. The core of our approach is an analysis of predicates in the HAVING-clause that allows the runtime pruning of groups. We employ monotonicity and codomain analysis in order to detect groups in which predicates would never be satisfied. Eventually, we aim to save I/O and CPU costs by discarding groups as early as possible. We start by providing a high-level overview of our approach and describe its use-cases. Then, we provide a short introduction into our system and describe a straightforward implementation of an aggregation operator. Next, we provide theoretical foundations for our approach and present an improved algorithm. Finally, we present an experimental validation of our approach inside PosDB — a distributed, disk-based column-store engine that features late materialization and block-oriented processing. Experiments using an SSD drive show that our approach can provide up to 5 times improvement over the naive version.

AB - —Aggregation is a database operation that aims to provide basic analytic capabilities by partitioning source data into several groups and computing some function on values belonging to the same group. Nowadays it is common in databases, and especially in the OLAP domain, which is a primary venue for column-stores. In this paper we propose a novel approach to the design of an aggregation operator inside a column-store system. The core of our approach is an analysis of predicates in the HAVING-clause that allows the runtime pruning of groups. We employ monotonicity and codomain analysis in order to detect groups in which predicates would never be satisfied. Eventually, we aim to save I/O and CPU costs by discarding groups as early as possible. We start by providing a high-level overview of our approach and describe its use-cases. Then, we provide a short introduction into our system and describe a straightforward implementation of an aggregation operator. Next, we provide theoretical foundations for our approach and present an improved algorithm. Finally, we present an experimental validation of our approach inside PosDB — a distributed, disk-based column-store engine that features late materialization and block-oriented processing. Experiments using an SSD drive show that our approach can provide up to 5 times improvement over the naive version.

UR - http://www.scopus.com/inward/record.url?scp=85050466191&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85050466191

VL - 2135

SP - 53

EP - 60

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -