Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Novel Approaches for Distributing Workload on Commodity Computer Systems. / Gankevich, Ivan; Tipikin, Yuri; Degtyarev, Alexander; Korkhov, Vladimir.
Computational Science and Its Applications - ICCSA 2015: 15th International Conference, Banff, AB, Canada, June 22-25, 2015, Proceedings, Part IV. Springer Nature, 2015. стр. 259-271 (Lecture Notes in Computer Science; Том 9158).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Novel Approaches for Distributing Workload on Commodity Computer Systems
AU - Gankevich, Ivan
AU - Tipikin, Yuri
AU - Degtyarev, Alexander
AU - Korkhov, Vladimir
N1 - Gankevich I., Tipikin Y., Degtyarev A., Korkhov V. (2015) Novel Approaches for Distributing Workload on Commodity Computer Systems. In: Gervasi O. et al. (eds) Computational Science and Its Applications -- ICCSA 2015. ICCSA 2015. Lecture Notes in Computer Science, vol 9158. Springer, Cham. https://doi.org/10.1007/978-3-319-21410-8_20
PY - 2015
Y1 - 2015
N2 - Efficient management of a distributed system is a common problem for university’s and commercial computer centres, and handling node failures is a major aspect of it. Failures which are rare in a small commodity cluster, at large scale become common, and there should be a way to overcome them without restarting all parallel processes of an application. The efficiency of existing methods can be improved by forming a hierarchy of distributed processes. That way only lower levels of the hierarchy need to be restarted in case of a leaf node failure, and only root node needs special treatment. Process hierarchy changes in real time and the workload is dynamically rebalanced across online nodes. This approach makes it possible to implement efficient partial restart of a parallel application, and transactional behaviour for computer centre service tasks.
AB - Efficient management of a distributed system is a common problem for university’s and commercial computer centres, and handling node failures is a major aspect of it. Failures which are rare in a small commodity cluster, at large scale become common, and there should be a way to overcome them without restarting all parallel processes of an application. The efficiency of existing methods can be improved by forming a hierarchy of distributed processes. That way only lower levels of the hierarchy need to be restarted in case of a leaf node failure, and only root node needs special treatment. Process hierarchy changes in real time and the workload is dynamically rebalanced across online nodes. This approach makes it possible to implement efficient partial restart of a parallel application, and transactional behaviour for computer centre service tasks.
KW - Long-lived transactions
KW - Distributed pipeline
KW - Node discovery
KW - Software engineering
KW - Distributed computing
KW - Cluster computing
UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-84948994004&doi=10.1007%2f978-3-319-21410-8_20&partnerID=40&md5=fa1e9638b13cb3970db1aeefbed98e70
U2 - 10.1007/978-3-319-21410-8_20
DO - 10.1007/978-3-319-21410-8_20
M3 - Conference contribution
SN - 978-3-319-21409-2
T3 - Lecture Notes in Computer Science
SP - 259
EP - 271
BT - Computational Science and Its Applications - ICCSA 2015
PB - Springer Nature
T2 - 15th International Conference on Computational Science and Its Applications, ICCSA 2015
Y2 - 21 June 2015 through 24 June 2015
ER -
ID: 71354892