Standard

ChunkFS: A Tool for Data Deduplication Methods Comparison. / Гориховский, Вячеслав Игоревич; Пилецкий, Олег Антонович.

2025 37th Conference of Open Innovations Association (FRUCT). Institute of Electrical and Electronics Engineers Inc., 2025. стр. 221-227 (Conference of Open Innovation Association, FRUCT).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференцииРецензирование

Harvard

Гориховский, ВИ & Пилецкий, ОА 2025, ChunkFS: A Tool for Data Deduplication Methods Comparison. в 2025 37th Conference of Open Innovations Association (FRUCT). Conference of Open Innovation Association, FRUCT, Institute of Electrical and Electronics Engineers Inc., стр. 221-227, The 37th FRUCT conference, Kufstein, Австрия, 14/05/25. https://doi.org/10.23919/fruct65909.2025.11007956

APA

Гориховский, В. И., & Пилецкий, О. А. (2025). ChunkFS: A Tool for Data Deduplication Methods Comparison. в 2025 37th Conference of Open Innovations Association (FRUCT) (стр. 221-227). (Conference of Open Innovation Association, FRUCT). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/fruct65909.2025.11007956

Vancouver

Гориховский ВИ, Пилецкий ОА. ChunkFS: A Tool for Data Deduplication Methods Comparison. в 2025 37th Conference of Open Innovations Association (FRUCT). Institute of Electrical and Electronics Engineers Inc. 2025. стр. 221-227. (Conference of Open Innovation Association, FRUCT). https://doi.org/10.23919/fruct65909.2025.11007956

Author

Гориховский, Вячеслав Игоревич ; Пилецкий, Олег Антонович. / ChunkFS: A Tool for Data Deduplication Methods Comparison. 2025 37th Conference of Open Innovations Association (FRUCT). Institute of Electrical and Electronics Engineers Inc., 2025. стр. 221-227 (Conference of Open Innovation Association, FRUCT).

BibTeX

@inproceedings{fe99ced780f04b9d87af61e4358ce5cd,
title = "ChunkFS: A Tool for Data Deduplication Methods Comparison",
abstract = "The dramatic rise in amount of data worldwide over the past years is a critical issue for storage and backup systems with no obvious and simple solution available. One of the main techniques to effectively store large amounts of data is deduplication, based on eliminating redundant data, of which there is a lot. The most expensive stage of deduplication process is chunking, taking up to 90% of time. Many algorithms have emerged in the recent years, with the goal to optimize space savings and throughput of the process, stating better effectiveness each time. Despite there being so many chunking algorithms, there are very few systems that can be used to compare them with each other.In this paper we present ChunkFS, a tool to compare deduplication techniques that allows to easily integrate different chunking and hashing algorithms, as well as storage types, and gather the necessary metrics to determine the most suitable one. We use it to compare some of the best performing algorithms and find out that SuperCDC outperforms every other one in speed, but not in speed savings, although with many parameters that can be tuned, it hints that with further research using the tool better results can be achieved. Besides Content Defined Chunking algorithms, other techniques can be compared using ChunkFS, such as Frequency Based Chunking, but that is out of scope of this paper.",
author = "Гориховский, {Вячеслав Игоревич} and Пилецкий, {Олег Антонович}",
year = "2025",
month = may,
day = "14",
doi = "10.23919/fruct65909.2025.11007956",
language = "English",
isbn = "9789526524634",
series = "Conference of Open Innovation Association, FRUCT",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "221--227",
booktitle = "2025 37th Conference of Open Innovations Association (FRUCT)",
address = "United States",
note = "null ; Conference date: 14-05-2025 Through 16-05-2025",
url = "https://www.fruct.org/conferences/37/registration/",

}

RIS

TY - GEN

T1 - ChunkFS: A Tool for Data Deduplication Methods Comparison

AU - Гориховский, Вячеслав Игоревич

AU - Пилецкий, Олег Антонович

PY - 2025/5/14

Y1 - 2025/5/14

N2 - The dramatic rise in amount of data worldwide over the past years is a critical issue for storage and backup systems with no obvious and simple solution available. One of the main techniques to effectively store large amounts of data is deduplication, based on eliminating redundant data, of which there is a lot. The most expensive stage of deduplication process is chunking, taking up to 90% of time. Many algorithms have emerged in the recent years, with the goal to optimize space savings and throughput of the process, stating better effectiveness each time. Despite there being so many chunking algorithms, there are very few systems that can be used to compare them with each other.In this paper we present ChunkFS, a tool to compare deduplication techniques that allows to easily integrate different chunking and hashing algorithms, as well as storage types, and gather the necessary metrics to determine the most suitable one. We use it to compare some of the best performing algorithms and find out that SuperCDC outperforms every other one in speed, but not in speed savings, although with many parameters that can be tuned, it hints that with further research using the tool better results can be achieved. Besides Content Defined Chunking algorithms, other techniques can be compared using ChunkFS, such as Frequency Based Chunking, but that is out of scope of this paper.

AB - The dramatic rise in amount of data worldwide over the past years is a critical issue for storage and backup systems with no obvious and simple solution available. One of the main techniques to effectively store large amounts of data is deduplication, based on eliminating redundant data, of which there is a lot. The most expensive stage of deduplication process is chunking, taking up to 90% of time. Many algorithms have emerged in the recent years, with the goal to optimize space savings and throughput of the process, stating better effectiveness each time. Despite there being so many chunking algorithms, there are very few systems that can be used to compare them with each other.In this paper we present ChunkFS, a tool to compare deduplication techniques that allows to easily integrate different chunking and hashing algorithms, as well as storage types, and gather the necessary metrics to determine the most suitable one. We use it to compare some of the best performing algorithms and find out that SuperCDC outperforms every other one in speed, but not in speed savings, although with many parameters that can be tuned, it hints that with further research using the tool better results can be achieved. Besides Content Defined Chunking algorithms, other techniques can be compared using ChunkFS, such as Frequency Based Chunking, but that is out of scope of this paper.

UR - https://www.mendeley.com/catalogue/ff4f0986-148a-3706-9be8-b965d85eedde/

U2 - 10.23919/fruct65909.2025.11007956

DO - 10.23919/fruct65909.2025.11007956

M3 - Conference contribution

SN - 9789526524634

T3 - Conference of Open Innovation Association, FRUCT

SP - 221

EP - 227

BT - 2025 37th Conference of Open Innovations Association (FRUCT)

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 14 May 2025 through 16 May 2025

ER -

ID: 137266822