On the road to a scientific data lake for the High Luminosity LHC era

Standard

On the road to a scientific data lake for the High Luminosity LHC era. / Alekseev, Aleksandr; Campana, Simone; Espinal, Xavier; Jezequel, Stephane; Kirianov, Andrey; Klimentov, Alexei; Korchuganova, Tatiana; Mitsyn, Valeri; Oleynik, Danila; Smirnov, Serge; Zarochentsev, Andrey.

In: International Journal of Modern Physics A, Vol. 35, No. 33, 227, 11.2020.

Research output: Contribution to journal › Review article › peer-review

Harvard

Alekseev, A, Campana, S, Espinal, X, Jezequel, S, Kirianov, A, Klimentov, A, Korchuganova, T, Mitsyn, V, Oleynik, D, Smirnov, S & Zarochentsev, A 2020, 'On the road to a scientific data lake for the High Luminosity LHC era', International Journal of Modern Physics A, vol. 35, no. 33, 227. https://doi.org/10.1142/S0217751X20300227

APA

Alekseev, A., Campana, S., Espinal, X., Jezequel, S., Kirianov, A., Klimentov, A., Korchuganova, T., Mitsyn, V., Oleynik, D., Smirnov, S., & Zarochentsev, A. (2020). On the road to a scientific data lake for the High Luminosity LHC era. International Journal of Modern Physics A, 35(33), [227]. https://doi.org/10.1142/S0217751X20300227

Vancouver

Alekseev A, Campana S, Espinal X, Jezequel S, Kirianov A, Klimentov A et al. On the road to a scientific data lake for the High Luminosity LHC era. International Journal of Modern Physics A. 2020 Nov;35(33). 227. https://doi.org/10.1142/S0217751X20300227

Author

Alekseev, Aleksandr ; Campana, Simone ; Espinal, Xavier ; Jezequel, Stephane ; Kirianov, Andrey ; Klimentov, Alexei ; Korchuganova, Tatiana ; Mitsyn, Valeri ; Oleynik, Danila ; Smirnov, Serge ; Zarochentsev, Andrey. / On the road to a scientific data lake for the High Luminosity LHC era. In: International Journal of Modern Physics A. 2020 ; Vol. 35, No. 33.

BibTeX

@article{e9daab377f91496390773f65d19c6f4b,

title = "On the road to a scientific data lake for the High Luminosity LHC era",

abstract = "The experiments at CERN's Large Hadron Collider use the Worldwide LHC Computing Grid, the WLCG, for its distributed computing infrastructure. Through the distributed workload and data management systems, they provide seamless access to hundreds of grid, HPC and cloud based computing and storage resources that are distributed worldwide to thousands of physicists. LHC experiments annually process more than an exabyte of data using an average of 500,000 distributed CPU cores, to enable hundreds of new scientific results from the collider. However, the resources available to the experiments have been insufficient to meet data processing, simulation and analysis needs over the past five years as the volume of data from the LHC has grown. The problem will be even more severe for the next LHC phases. High Luminosity LHC will be a multiexabyte challenge where the envisaged Storage and Compute needs are a factor 10 to 100 above the expected technology evolution. The particle physics community needs to evolve current computing and data organization models in order to introduce changes in the way it uses and manages the infrastructure, focused on optimizations to bring performance and efficiency not forgetting simplification of operations. In this paper we highlight a recent R&D project related to scientific data lake and federated data storage. ",

keywords = "Data lake, DOMA, LHC",

author = "Aleksandr Alekseev and Simone Campana and Xavier Espinal and Stephane Jezequel and Andrey Kirianov and Alexei Klimentov and Tatiana Korchuganova and Valeri Mitsyn and Danila Oleynik and Serge Smirnov and Andrey Zarochentsev",

note = "Publisher Copyright: {\textcopyright} 2020 World Scientific Publishing Company.",

year = "2020",

month = nov,

doi = "10.1142/S0217751X20300227",

language = "English",

volume = "35",

journal = "International Journal of Modern Physics A",

issn = "0217-751X",

publisher = "WORLD SCIENTIFIC PUBL CO PTE LTD",

number = "33",

}

RIS

TY - JOUR

T1 - On the road to a scientific data lake for the High Luminosity LHC era

AU - Alekseev, Aleksandr

AU - Campana, Simone

AU - Espinal, Xavier

AU - Jezequel, Stephane

AU - Kirianov, Andrey

AU - Klimentov, Alexei

AU - Korchuganova, Tatiana

AU - Mitsyn, Valeri

AU - Oleynik, Danila

AU - Smirnov, Serge

AU - Zarochentsev, Andrey

PY - 2020/11

Y1 - 2020/11

N2 - The experiments at CERN's Large Hadron Collider use the Worldwide LHC Computing Grid, the WLCG, for its distributed computing infrastructure. Through the distributed workload and data management systems, they provide seamless access to hundreds of grid, HPC and cloud based computing and storage resources that are distributed worldwide to thousands of physicists. LHC experiments annually process more than an exabyte of data using an average of 500,000 distributed CPU cores, to enable hundreds of new scientific results from the collider. However, the resources available to the experiments have been insufficient to meet data processing, simulation and analysis needs over the past five years as the volume of data from the LHC has grown. The problem will be even more severe for the next LHC phases. High Luminosity LHC will be a multiexabyte challenge where the envisaged Storage and Compute needs are a factor 10 to 100 above the expected technology evolution. The particle physics community needs to evolve current computing and data organization models in order to introduce changes in the way it uses and manages the infrastructure, focused on optimizations to bring performance and efficiency not forgetting simplification of operations. In this paper we highlight a recent R&D project related to scientific data lake and federated data storage.

AB - The experiments at CERN's Large Hadron Collider use the Worldwide LHC Computing Grid, the WLCG, for its distributed computing infrastructure. Through the distributed workload and data management systems, they provide seamless access to hundreds of grid, HPC and cloud based computing and storage resources that are distributed worldwide to thousands of physicists. LHC experiments annually process more than an exabyte of data using an average of 500,000 distributed CPU cores, to enable hundreds of new scientific results from the collider. However, the resources available to the experiments have been insufficient to meet data processing, simulation and analysis needs over the past five years as the volume of data from the LHC has grown. The problem will be even more severe for the next LHC phases. High Luminosity LHC will be a multiexabyte challenge where the envisaged Storage and Compute needs are a factor 10 to 100 above the expected technology evolution. The particle physics community needs to evolve current computing and data organization models in order to introduce changes in the way it uses and manages the infrastructure, focused on optimizations to bring performance and efficiency not forgetting simplification of operations. In this paper we highlight a recent R&D project related to scientific data lake and federated data storage.

KW - Data lake

KW - DOMA

KW - LHC

UR - http://www.scopus.com/inward/record.url?scp=85097425065&partnerID=8YFLogxK

U2 - 10.1142/S0217751X20300227

DO - 10.1142/S0217751X20300227

M3 - Review article

AN - SCOPUS:85097425065

VL - 35

JO - International Journal of Modern Physics A

JF - International Journal of Modern Physics A

SN - 0217-751X

IS - 33

M1 - 227

ER -

ID: 88353950