Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark

Vladimir Korkhov, Ivan Gankevich, Oleg Iakushkin, Dmitry Gushchanskiy, Dmitry Khmel, Andrey Ivashchenko, Alexander Pyayt, Sergey Zobnin, Alexander Loginov

Research output

Abstract

Modern architectures of data acquisition and processing often consider low-cost and low-power devices that can be bound together to form a distributed infrastructure. In this paper we overview possibilities to organize a distributed computing testbed based on microcomputers similar to Raspberry Pi and Intel Edison. The goal of the research is to investigate and develop a scheduler for orchestrating distributed data processing and general purpose computations on such unreliable and resource-constrained hardware. Also we consider integration of the scheduler with well-known distributed data processing framework Apache Spark. We outline the project carried out in collaboration with Siemens LLC to compare different configurations of the hardware and software deployment and evaluate performance and applicability of the tools to the testbed.
Original languageUndefined
Pages (from-to)387-398
JournalLecture Notes in Computer Science
Volume10408
DOIs
Publication statusPublished - 2017

Cite this

Korkhov, Vladimir ; Gankevich, Ivan ; Iakushkin, Oleg ; Gushchanskiy, Dmitry ; Khmel, Dmitry ; Ivashchenko, Andrey ; Pyayt, Alexander ; Zobnin, Sergey ; Loginov, Alexander. / Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark. In: Lecture Notes in Computer Science. 2017 ; Vol. 10408. pp. 387-398.
@article{32be9b70c2d4486aa80100f0d55218f9,
title = "Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark",
abstract = "Modern architectures of data acquisition and processing often consider low-cost and low-power devices that can be bound together to form a distributed infrastructure. In this paper we overview possibilities to organize a distributed computing testbed based on microcomputers similar to Raspberry Pi and Intel Edison. The goal of the research is to investigate and develop a scheduler for orchestrating distributed data processing and general purpose computations on such unreliable and resource-constrained hardware. Also we consider integration of the scheduler with well-known distributed data processing framework Apache Spark. We outline the project carried out in collaboration with Siemens LLC to compare different configurations of the hardware and software deployment and evaluate performance and applicability of the tools to the testbed.",
keywords = "Microcomputers Scheduling Apache Spark Raspberry Pi Fault tolerance High availability",
author = "Vladimir Korkhov and Ivan Gankevich and Oleg Iakushkin and Dmitry Gushchanskiy and Dmitry Khmel and Andrey Ivashchenko and Alexander Pyayt and Sergey Zobnin and Alexander Loginov",
year = "2017",
doi = "10.1007/978-3-319-62404-4_28",
language = "не определен",
volume = "10408",
pages = "387--398",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer",

}

Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark. / Korkhov, Vladimir; Gankevich, Ivan; Iakushkin, Oleg; Gushchanskiy, Dmitry; Khmel, Dmitry; Ivashchenko, Andrey; Pyayt, Alexander; Zobnin, Sergey; Loginov, Alexander.

In: Lecture Notes in Computer Science, Vol. 10408, 2017, p. 387-398.

Research output

TY - JOUR

T1 - Distributed Data Processing on Microcomputers with Ascheduler and Apache Spark

AU - Korkhov, Vladimir

AU - Gankevich, Ivan

AU - Iakushkin, Oleg

AU - Gushchanskiy, Dmitry

AU - Khmel, Dmitry

AU - Ivashchenko, Andrey

AU - Pyayt, Alexander

AU - Zobnin, Sergey

AU - Loginov, Alexander

PY - 2017

Y1 - 2017

N2 - Modern architectures of data acquisition and processing often consider low-cost and low-power devices that can be bound together to form a distributed infrastructure. In this paper we overview possibilities to organize a distributed computing testbed based on microcomputers similar to Raspberry Pi and Intel Edison. The goal of the research is to investigate and develop a scheduler for orchestrating distributed data processing and general purpose computations on such unreliable and resource-constrained hardware. Also we consider integration of the scheduler with well-known distributed data processing framework Apache Spark. We outline the project carried out in collaboration with Siemens LLC to compare different configurations of the hardware and software deployment and evaluate performance and applicability of the tools to the testbed.

AB - Modern architectures of data acquisition and processing often consider low-cost and low-power devices that can be bound together to form a distributed infrastructure. In this paper we overview possibilities to organize a distributed computing testbed based on microcomputers similar to Raspberry Pi and Intel Edison. The goal of the research is to investigate and develop a scheduler for orchestrating distributed data processing and general purpose computations on such unreliable and resource-constrained hardware. Also we consider integration of the scheduler with well-known distributed data processing framework Apache Spark. We outline the project carried out in collaboration with Siemens LLC to compare different configurations of the hardware and software deployment and evaluate performance and applicability of the tools to the testbed.

KW - Microcomputers Scheduling Apache Spark Raspberry Pi Fault tolerance High availability

UR - https://www.scopus.com/inward/record.uri?eid=2-s2.0-85026766337&doi=10.1007%2f978-3-319-62404-4_28&partnerID=40&md5=00b89ba048825a6c725d24ee622b3625

U2 - 10.1007/978-3-319-62404-4_28

DO - 10.1007/978-3-319-62404-4_28

M3 - статья

VL - 10408

SP - 387

EP - 398

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -