Redesigning codes and workflows to take advantage of exascale simulation requires addressing I/O and workflow problems which arise throughout a sequence of activities that may begin and end with deep storage and will involve both simulation and analysis phases.

Many simulations will begin by exploiting data which need to be extracted from deep storage. Such data are needed to provide ancillary data, initial conditions, and boundary conditions for simulations. In the weather and climate use case some such data may need to be extracted from within large datasets, and even from within “files” inside such datasets. There may be some processing required to do the extraction, possibly including decompression, subsetting, regridding (“re-meshing”) and reformatting. Depending on volumes and available tools, such processing may be best handled adjacent (and/or within) the storage system itself or handled during the simulation workflow itself.

Data needs to be transferred to a simulation platform (either locally or across a WAN) where the key exascale problems include:

  • initialising on many processors;
  • parallel write (both for checkpointing and products);
  • the need for data reduction (both lossy and lossless, and as part of one or both in-flight data analysis and in the output-stream);
  • methods of exploiting burst buffers if present; and
  • handling file system peculiarities. 

Data analysis may be on the same system, or another, and will generally involve manipulation alongside other data, which may themselves be even more voluminous than the simulation products themselves. It will be important to recognise that individual workflows will have parallelisation requirements that wax and wane through a series of stages which may involve the need to manipulate data into other forms more suitable for different stages (e.g., for pipelining into AI/ML GPU workflows).

Analysis workflows will need to leverage libraries which provide optimisation, workflow analysis, and standard data manipulations, to avoid spending vast amounts of time optimising ephemeral analysis software. Such tools will need:

  • suitable abstractions for the storage environment (“storage containers”);
  • to make use of virtualisation and any elements of hardware support for data manipulations (active storage may provide basic mathematical operations and on the fly compression/decompression);
  • to involve specialised visualisation hardware in some cases; and
  • to be portable and avoid dependencies on vendor specific environments (although they should be able to exploit enhancements such as active storage when available). 

When analysis is complete, products will be moved to deep storage (and/or into a curated environment). This will involve creating appropriate metadata for discovery and extraction.

Project activities

For the storage part of the project, we address storage interfaces at the application level and tools for managing information about what is where across tiers and between institutions, we examine the possibilities for accelerating I/O using both network fabric and advanced burst buffers, and we compare and contrast examples of generic and domain specific I/O-middleware.

This work recognises the difficulties that users have in managing the location of their data across tiered storage, and of configuring their workflows to take advantage of newly available acceleration options which otherwise require rather arcane knowledge. Recognising that there are different problems in the analysis and simulation phases, we target both the analysis and simulation phase. We specifically look at the impact of the various technologies on synthetic tests, and on real use-cases from both Weather and Climate, and Fusion (including techniques for AI/ML and other analysis on the huge datasets expected).

The knowledge transfer activity will exploit partner experience with a range of different storage technologies as well as the body of work carried out with this funding. All activities aim to address the necessary robust and generic transformation in capability, hiding storage configurations from applications, utilising radical new approaches to delivering highly efficient solutions.

For the workflow part of the project, we address direct support for moving computation out of the traditional analysis phase by taking a range of Weather and Climate ensemble manipulations into the simulation phase, and by taking a range of generic data reductions out of application libraries and into the storage layer. The latter involves recognising that many (but far from all) data analysis workflows are ephemeral and that the biggest opportunity for a new paradigm of thinking is to make changes in libraries so that users have to do minimal optimisation of their particular code.

To that end, we will work on enhancing existing analysis libraries, both specific to weather and climate, and more generically, to take advantage of new analysis capabilities that we will engineer in storage – delivering in a very practical way “active-storage” functionality. We will also continue and extend work we have done on “ensemble data analysis” for the situation where we can have all members of the ensemble “in-flight” at the same time. The ensemble work will reduce the flow of data to storage during simulation, and the use of active storage will reduce the flow of data from storage during analysis. The knowledge transfer activity directly involves industrial partners, not only to benefit from their skills, but also to create other routes for information transfer to RSEs (via commercial support).

Latest news