DOI

Zipf’s law have been shown to hold true in many places. From it’s first idea of a statistical phenomenon related to natural language to it’s later adaptations for economical, social and many other fields, it has been shown to work almost universally. In all of these cases authors discuss the applicability of the Zipf’s law in terms of semantically complex structures. We take this notion a step further and show how this law can work for data analysis, in particular for the sequences of byte data, obtained from various sources. We show that, using the basic chunking methodology, the Zipf’s law can be shown to hold true for many different types of raw sequences of byte data. In particular, the law holds true in all caes for the “middle point” of data, where it is present with a degree of certainty of more than 90 %. We conclude by discussing the implications and potential use cases of these findings.
Original languageEnglish
Pages (from-to)391–403
Number of pages13
JournalВестник Санкт-Петербургского университета. Прикладная математика. Информатика. Процессы управления
Volume20
Issue number3
DOIs
StatePublished - 2024

    Research areas

  • Zipf’s laws, byte data, chunking, frequency analysis

ID: 126974789