Standard

Federated data storage evolution in HENP : Data lakes and beyond. / Zarochentsev, Andrey; Espinal, Xavier; Kiryanov, Andrey; Schovancová, Jaroslava.

In: Journal of Physics: Conference Series, Vol. 1525, No. 1, 012071, 07.07.2020.

Research output: Contribution to journalConference articlepeer-review

Harvard

Zarochentsev, A, Espinal, X, Kiryanov, A & Schovancová, J 2020, 'Federated data storage evolution in HENP: Data lakes and beyond', Journal of Physics: Conference Series, vol. 1525, no. 1, 012071. https://doi.org/10.1088/1742-6596/1525/1/012071

APA

Zarochentsev, A., Espinal, X., Kiryanov, A., & Schovancová, J. (2020). Federated data storage evolution in HENP: Data lakes and beyond. Journal of Physics: Conference Series, 1525(1), [012071]. https://doi.org/10.1088/1742-6596/1525/1/012071

Vancouver

Zarochentsev A, Espinal X, Kiryanov A, Schovancová J. Federated data storage evolution in HENP: Data lakes and beyond. Journal of Physics: Conference Series. 2020 Jul 7;1525(1). 012071. https://doi.org/10.1088/1742-6596/1525/1/012071

Author

Zarochentsev, Andrey ; Espinal, Xavier ; Kiryanov, Andrey ; Schovancová, Jaroslava. / Federated data storage evolution in HENP : Data lakes and beyond. In: Journal of Physics: Conference Series. 2020 ; Vol. 1525, No. 1.

BibTeX

@article{d1da6d94ea344fb3b45b3bc53b662273,
title = "Federated data storage evolution in HENP: Data lakes and beyond",
abstract = "Storage has been identified as the main challenge for the future distributed computing infrastructures: Particle Physics (HL-LHC, DUNE, Belle-II), Astrophysics and Cosmology (SKA, LSST). In particular, the High Luminosity LHC (HL-LHC) will begin operations in the year of 2026 with expected data volumes to increase by at least an order of magnitude as compared with the present systems. Extrapolating from existing trends in disk and tape pricing, and assuming flat infrastructure budgets, the implications for data handling for end-user analysis are significant. HENP experiments need to manage data across a variety of mediums based on the types of data and its uses: from tapes (cold storage) to disks and solid state drives (hot storage) to caches (including world wide access data in clouds and {"}data lakes{"}). The DataLake R&D project aims at exploring an evolution of distributed storage while bearing in mind very high demands of the HL-LHC era. Its primary objective is to optimize hardware usage and operational costs of a storage system deployed across distributed centers connected by fat networks and operated as a single service. Such storage would host a large fraction of the data and optimize the cost, eliminating inefficiencies due to fragmentation. In this talk we will highlight current status of the project, its achievements, interconnection with other research activities in this field like WLCG-DOMA and ATLAS-Google DataOcean, and future plans.",
author = "Andrey Zarochentsev and Xavier Espinal and Andrey Kiryanov and Jaroslava Schovancov{\'a}",
note = "Publisher Copyright: {\textcopyright} Published under licence by IOP Publishing Ltd.; 19th International Workshop on Advanced Computing and Analysis Techniques in Physics Research, ACAT 2019 ; Conference date: 11-03-2019 Through 15-03-2019",
year = "2020",
month = jul,
day = "7",
doi = "10.1088/1742-6596/1525/1/012071",
language = "English",
volume = "1525",
journal = "Journal of Physics: Conference Series",
issn = "1742-6588",
publisher = "IOP Publishing Ltd.",
number = "1",

}

RIS

TY - JOUR

T1 - Federated data storage evolution in HENP

T2 - 19th International Workshop on Advanced Computing and Analysis Techniques in Physics Research, ACAT 2019

AU - Zarochentsev, Andrey

AU - Espinal, Xavier

AU - Kiryanov, Andrey

AU - Schovancová, Jaroslava

N1 - Publisher Copyright: © Published under licence by IOP Publishing Ltd.

PY - 2020/7/7

Y1 - 2020/7/7

N2 - Storage has been identified as the main challenge for the future distributed computing infrastructures: Particle Physics (HL-LHC, DUNE, Belle-II), Astrophysics and Cosmology (SKA, LSST). In particular, the High Luminosity LHC (HL-LHC) will begin operations in the year of 2026 with expected data volumes to increase by at least an order of magnitude as compared with the present systems. Extrapolating from existing trends in disk and tape pricing, and assuming flat infrastructure budgets, the implications for data handling for end-user analysis are significant. HENP experiments need to manage data across a variety of mediums based on the types of data and its uses: from tapes (cold storage) to disks and solid state drives (hot storage) to caches (including world wide access data in clouds and "data lakes"). The DataLake R&D project aims at exploring an evolution of distributed storage while bearing in mind very high demands of the HL-LHC era. Its primary objective is to optimize hardware usage and operational costs of a storage system deployed across distributed centers connected by fat networks and operated as a single service. Such storage would host a large fraction of the data and optimize the cost, eliminating inefficiencies due to fragmentation. In this talk we will highlight current status of the project, its achievements, interconnection with other research activities in this field like WLCG-DOMA and ATLAS-Google DataOcean, and future plans.

AB - Storage has been identified as the main challenge for the future distributed computing infrastructures: Particle Physics (HL-LHC, DUNE, Belle-II), Astrophysics and Cosmology (SKA, LSST). In particular, the High Luminosity LHC (HL-LHC) will begin operations in the year of 2026 with expected data volumes to increase by at least an order of magnitude as compared with the present systems. Extrapolating from existing trends in disk and tape pricing, and assuming flat infrastructure budgets, the implications for data handling for end-user analysis are significant. HENP experiments need to manage data across a variety of mediums based on the types of data and its uses: from tapes (cold storage) to disks and solid state drives (hot storage) to caches (including world wide access data in clouds and "data lakes"). The DataLake R&D project aims at exploring an evolution of distributed storage while bearing in mind very high demands of the HL-LHC era. Its primary objective is to optimize hardware usage and operational costs of a storage system deployed across distributed centers connected by fat networks and operated as a single service. Such storage would host a large fraction of the data and optimize the cost, eliminating inefficiencies due to fragmentation. In this talk we will highlight current status of the project, its achievements, interconnection with other research activities in this field like WLCG-DOMA and ATLAS-Google DataOcean, and future plans.

UR - http://www.scopus.com/inward/record.url?scp=85088251180&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/1525/1/012071

DO - 10.1088/1742-6596/1525/1/012071

M3 - Conference article

AN - SCOPUS:85088251180

VL - 1525

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

SN - 1742-6588

IS - 1

M1 - 012071

Y2 - 11 March 2019 through 15 March 2019

ER -

ID: 88354178