Research output: Contribution to journal › Conference article › peer-review
A common technique to speed up DBMS query processing is to cache parts of query results and reuse them later. In this paper we propose a novel approach which is aimed specifically at caching intermediates in a late-materialization-oriented column-store. The idea of our approach is to cache positions (row numbers) instead of data values. The small size of positional representation is a valuable advantage: cache can accommodate more entries and consider intermediates that involve “heavy” operators, e.g. joins of large tables. Position caching thrives in late materialization environments since position exchange is prevalent in them. In particular, expensive predicates and heavy joins are usually processed based on positions. Our approach is able to cache them efficiently, thus significantly reducing system load. To assess the importance of intermediates our position caching technique features a cost model that is based on usage statistics and complexity estimations. Furthermore, to allow intermediate reuse for the queries that are not fully identical, we proposed an efficient query containment checking algorithm. Several policies for cache population and eviction were proposed. Finally, our approach is enhanced by lightweight compression schemes. Experimental evaluation was performed using a stream of randomly generated Star-Schema-Benchmark-like queries. It showed up to 3 times improvement in query run times. Additionally, compressing the intermediates reduces the space requirements by up to 2 times without a noticeable performance overhead.
| Original language | English |
|---|---|
| Pages (from-to) | 89-93 |
| Number of pages | 5 |
| Journal | CEUR Workshop Proceedings |
| Volume | 2572 |
| State | Published - 2020 |
| Event | 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2020 - Copenhagen, Denmark Duration: 30 Mar 2020 → … |
ID: 98682156