Smart Caching for Efficient Functional Dependency Discovery › Научные исследования в СПбГУ

DOI

https://doi.org/10.1007/978-3-030-30278-8_7
Конечная издательская версия

Anastasia Birillo
Nikita Bobrov

Functional dependency (FD) is a concept for describing regularities in data, that traditionally was used for schema normalization. Recently, a different problem has emerged: For a given dataset, find FDs that are contained in it. Efficient FD discovery is of great importance due to FD usage in many tasks such as data cleaning, schema recovery, and query optimization. In this paper we consider a particular class of FD discovery algorithms that rely on partition intersection. We present a simple approach for caching intermediate results that are generated during run times of these algorithms. Our approach is essentially a heuristic that selects which partitions to cache based on measures of disorder in data. For this purpose, we adopt such well-known measures as Entropy and Gini Impurity, and propose a novel one—Inverted Entropy. Our approach has a negligible computational overhead, and can be used with a number of FD discovery algorithms, both exact and approximate. Our experiments demonstrate that our heuristic allows to decrease algorithm run times by up to 40% while simultaneously requiring up to an order of magnitude less space compared to the state-of-the-art caching approaches.

Язык оригинала	английский
Название основной публикации	New Trends in Databases and Information Systems
Издатель	Springer Nature
Страницы	52-59
Число страниц	8
ISBN (электронное издание)	978-3-030-30278-8
ISBN (печатное издание)	978-3-030-30277-1
DOI	https://doi.org/10.1007/978-3-030-30278-8_7
Состояние	Опубликовано - 1 янв 2019
Событие	23rd European Conference on Advances in Databases and Information Systems, ADBIS 2019 - Bled, Словения Продолжительность: 8 сен 2019 → 11 сен 2019 Номер конференции: 231719

Серия публикаций

Название	Communications in Computer and Information Science
Издатель	Springer Nature
Том	1064
ISSN (печатное издание)	1865-0929

конференция

конференция	23rd European Conference on Advances in Databases and Information Systems, ADBIS 2019
Страна/Tерритория	Словения
Город	Bled
Период	8/09/19 → 11/09/19

ID: 113380819