We analyze the problem of processing of very large datasets on parallel systems and find that the natural approaches to parallelization fail for two reasons. One is connected to long-range correlations between data and the other comes from nonscalar nature of the data. To overcome those difficulties the new paradigm of the data processing is proposed, based on a statistical simulation of the datasets, which in its turn for different types of data is realized on three approaches - decomposition of the statistical ensemble, decomposition on the base of principle of mixing and decomposition over the indexing variable. Some examples of proposed approach show its very effective scaling.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
EditorsMarian Bubak, Geert Dick van Albada, Peter M.A. Sloot, Jack J. Dongarra
PublisherSpringer Nature
Pages239-246
Number of pages8
ISBN (Print)9783540221142
DOIs
StatePublished - 2004

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3036
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

    Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

ID: 77309648