On the road to a scientific data lake for the High Luminosity LHC era

DOI

https://doi.org/10.1142/S0217751X20300227
Final published version

Aleksandr Alekseev
Simone Campana
Xavier Espinal
Stephane Jezequel
Andrey Kirianov
Alexei Klimentov
Tatiana Korchuganova
Valeri Mitsyn
Danila Oleynik
Serge Smirnov
Andrey Zarochentsev

The experiments at CERN's Large Hadron Collider use the Worldwide LHC Computing Grid, the WLCG, for its distributed computing infrastructure. Through the distributed workload and data management systems, they provide seamless access to hundreds of grid, HPC and cloud based computing and storage resources that are distributed worldwide to thousands of physicists. LHC experiments annually process more than an exabyte of data using an average of 500,000 distributed CPU cores, to enable hundreds of new scientific results from the collider. However, the resources available to the experiments have been insufficient to meet data processing, simulation and analysis needs over the past five years as the volume of data from the LHC has grown. The problem will be even more severe for the next LHC phases. High Luminosity LHC will be a multiexabyte challenge where the envisaged Storage and Compute needs are a factor 10 to 100 above the expected technology evolution. The particle physics community needs to evolve current computing and data organization models in order to introduce changes in the way it uses and manages the infrastructure, focused on optimizations to bring performance and efficiency not forgetting simplification of operations. In this paper we highlight a recent R&D project related to scientific data lake and federated data storage.

Original language	English
Article number	227
Journal	International Journal of Modern Physics A
Volume	35
Issue number	33
DOIs	https://doi.org/10.1142/S0217751X20300227
State	Published - Nov 2020

Research areas

Data lake, DOMA, LHC

Scopus subject areas

Atomic and Molecular Physics, and Optics
Nuclear and High Energy Physics
Astronomy and Astrophysics

ID: 88353950