The dramatic rise in amount of data worldwide over the past years is a critical issue for storage and backup systems with no obvious and simple solution available. One of the main techniques to effectively store large amounts of data is deduplication, based on eliminating redundant data, of which there is a lot. The most expensive stage of deduplication process is chunking, taking up to 90% of time. Many algorithms have emerged in the recent years, with the goal to optimize space savings and throughput of the process, stating better effectiveness each time. Despite there being so many chunking algorithms, there are very few systems that can be used to compare them with each other.In this paper we present ChunkFS, a tool to compare deduplication techniques that allows to easily integrate different chunking and hashing algorithms, as well as storage types, and gather the necessary metrics to determine the most suitable one. We use it to compare some of the best performing algorithms and find out that SuperCDC outperforms every other one in speed, but not in speed savings, although with many parameters that can be tuned, it hints that with further research using the tool better results can be achieved. Besides Content Defined Chunking algorithms, other techniques can be compared using ChunkFS, such as Frequency Based Chunking, but that is out of scope of this paper.
Original languageEnglish
Title of host publication2025 37th Conference of Open Innovations Association (FRUCT)
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages221-227
Number of pages7
ISBN (Print)9789526524634
DOIs
StatePublished - 14 May 2025
EventThe 37th FRUCT conference: FRUCT37 - UiT The Arctic University of Norway, Kufstein, Austria
Duration: 14 May 202516 May 2025
https://www.fruct.org/conferences/37/registration/

Publication series

NameConference of Open Innovation Association, FRUCT

Conference

ConferenceThe 37th FRUCT conference
Abbreviated titleFRUCT37
Country/TerritoryAustria
CityKufstein
Period14/05/2516/05/25
Internet address

ID: 137266822