Novel Approaches for Distributing Workload on Commodity Computer Systems

DOI

https://doi.org/10.1007/978-3-319-21410-8_20
Final published version

Efficient management of a distributed system is a common problem for university’s and commercial computer centres, and handling node failures is a major aspect of it. Failures which are rare in a small commodity cluster, at large scale become common, and there should be a way to overcome them without restarting all parallel processes of an application. The efficiency of existing methods can be improved by forming a hierarchy of distributed processes. That way only lower levels of the hierarchy need to be restarted in case of a leaf node failure, and only root node needs special treatment. Process hierarchy changes in real time and the workload is dynamically rebalanced across online nodes. This approach makes it possible to implement efficient partial restart of a parallel application, and transactional behaviour for computer centre service tasks.

Original language	English
Title of host publication	Computational Science and Its Applications - ICCSA 2015
Subtitle of host publication	15th International Conference, Banff, AB, Canada, June 22-25, 2015, Proceedings, Part IV
Publisher	Springer Nature
Pages	259-271
ISBN (Electronic)	978-3-319-21410-8
ISBN (Print)	978-3-319-21409-2
DOIs	https://doi.org/10.1007/978-3-319-21410-8_20
State	Published - 2015
Event	15th International Conference on Computational Science and Its Applications, ICCSA 2015 - Banff, Canada Duration: 21 Jun 2015 → 24 Jun 2015

Publication series

Name	Lecture Notes in Computer Science
Publisher	Springer Nature
Volume	9158
ISSN (Print)	0302-9743

Conference

Conference	15th International Conference on Computational Science and Its Applications, ICCSA 2015
Country/Territory	Canada
City	Banff
Period	21/06/15 → 24/06/15

Research areas

Long-lived transactions, Distributed pipeline, Node discovery, Software engineering, Distributed computing, Cluster computing

ID: 71354892