Factory: Master node high-availability for Big Data applications and beyond

Links

http://link.springer.com/chapter/10.1007/978-3-319-42108-7_29

DOI

https://doi.org/10.1007/978-3-319-42108-7_29
Other version

Master node fault-tolerance is the topic that is often dimmed in the discussion of big data processing technologies. Although failure of a master node can take down the whole data processing pipeline, this is considered either improbable or too difficult to encounter. The aim of the studies reported here is to propose rather simple technique to deal with master-node failures. This technique is based on temporary delegation of master role to one of the slave nodes and transferring updated state back to the master when one step of computation is complete. That way the state is duplicated and computation can proceed to the next step regardless of a failure of a delegate or the master (but not both). We run benchmarks to show that a failure of a master is almost “invisible” to other nodes, and failure of a delegate results in recomputation of only one step of data processing pipeline. We believe that the technique can be used not only in Big Data processing but in other types of applications.

Original language	English
Title of host publication	Computational Science and Its Applications – ICCSA 2016
Subtitle of host publication	16th International Conference, Beijing, China, July 4-7, 2016, Proceedings, Part II
Publisher	Springer Nature
Pages	379-389
ISBN (Electronic)	978-3-319-42108-7
ISBN (Print)	978-3-319-42107-0
DOIs	https://doi.org/10.1007/978-3-319-42108-7_29
State	Published - 2016
Event	16th International Conference on Computational Science and Its Applications - Beijing, China Duration: 4 Jul 2016 → 6 Jul 2016 Conference number: 16

Publication series

Name	Lecture Notes in Computer Science
Publisher	Springer Nature
Volume	9787
ISSN (Print)	0302-9743

Conference

Conference	16th International Conference on Computational Science and Its Applications
Abbreviated title	ICCSA 2016
Country/Territory	China
City	Beijing
Period	4/07/16 → 6/07/16

Research areas

Parallel computing, Big data processing, Distributed computing, Backup node, State transfer, Delegation, Cluster computing, Fault-tolerance

ID: 71352797