ExCALIData

Many simulations will begin by exploiting data which need to be extracted from deep storage. Such data are needed to provide ancillary data, initial conditions, and boundary conditions for simulations. In the weather and climate use case some such data may need to be extracted from within large datasets, and even from within “files” inside such datasets. There may be some processing required to do the extraction, possibly including decompression, subsetting, regridding (“re-meshing”) and reformatting. Depending on volumes and available tools, such processing may be best handled adjacent (and/or within) the storage system itself or handled during the simulation workflow itself.

Data needs to be transferred to a simulation platform (either locally or across a WAN) where the key exascale problems include:

initialising on many processors;
parallel write (both for checkpointing and products);
the need for data reduction (both lossy and lossless, and as part of one or both in-flight data analysis and in the output-stream);
methods of exploiting burst buffers if present; and
handling file system peculiarities.

Data analysis may be on the same system, or another, and will generally involve manipulation alongside other data, which may themselves be even more voluminous than the simulation products themselves. It will be important to recognise that individual workflows will have parallelisation requirements that wax and wane through a series of stages which may involve the need to manipulate data into other forms more suitable for different stages (e.g., for pipelining into AI/ML GPU workflows).

Analysis workflows will need to leverage libraries which provide optimisation, workflow analysis, and standard data manipulations, to avoid spending vast amounts of time optimising ephemeral analysis software. Such tools will need:

suitable abstractions for the storage environment (“storage containers”);
to make use of virtualisation and any elements of hardware support for data manipulations (active storage may provide basic mathematical operations and on the fly compression/decompression);
to involve specialised visualisation hardware in some cases; and
to be portable and avoid dependencies on vendor specific environments (although they should be able to exploit enhancements such as active storage when available).

When analysis is complete, products will be moved to deep storage (and/or into a curated environment). This will involve creating appropriate metadata for discovery and extraction.

Project activities

For the storage part of the project, we address storage interfaces at the application level and tools for managing information about what is where across tiers and between institutions, we examine the possibilities for accelerating I/O using both network fabric and advanced burst buffers, and we compare and contrast examples of generic and domain specific I/O-middleware.

This work recognises the difficulties that users have in managing the location of their data across tiered storage, and of configuring their workflows to take advantage of newly available acceleration options which otherwise require rather arcane knowledge. Recognising that there are different problems in the analysis and simulation phases, we target both the analysis and simulation phase. We specifically look at the impact of the various technologies on synthetic tests, and on real use-cases from both Weather and Climate, and Fusion (including techniques for AI/ML and other analysis on the huge datasets expected).

The knowledge transfer activity will exploit partner experience with a range of different storage technologies as well as the body of work carried out with this funding. All activities aim to address the necessary robust and generic transformation in capability, hiding storage configurations from applications, utilising radical new approaches to delivering highly efficient solutions.

For the workflow part of the project, we address direct support for moving computation out of the traditional analysis phase by taking a range of Weather and Climate ensemble manipulations into the simulation phase, and by taking a range of generic data reductions out of application libraries and into the storage layer. The latter involves recognising that many (but far from all) data analysis workflows are ephemeral and that the biggest opportunity for a new paradigm of thinking is to make changes in libraries so that users have to do minimal optimisation of their particular code.

To that end, we will work on enhancing existing analysis libraries, both specific to weather and climate, and more generically, to take advantage of new analysis capabilities that we will engineer in storage – delivering in a very practical way “active-storage” functionality. We will also continue and extend work we have done on “ensemble data analysis” for the situation where we can have all members of the ensemble “in-flight” at the same time. The ensemble work will reduce the flow of data to storage during simulation, and the use of active storage will reduce the flow of data from storage during analysis. The knowledge transfer activity directly involves industrial partners, not only to benefit from their skills, but also to create other routes for information transfer to RSEs (via commercial support).

Theme

Cross-cutting Research

Contact

Sadie Bartholomew, National Centre for Atmospheric Science
Fanny Adloff

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Project activities

Theme

Contact

Related resources

The ExcaliData Implementation of Active Storage.

Explore our projects

LEXCI

Coupling, Synthesis and Performance

Met Office Projects

Latest news

AI on Real-World Applications: Placement opportunities at STFC

RISC-V testbed at the RISC-V Summit Europe

A busy week at PASC24

Sign up for the latest ExCALIBUR updates and news