Radical changes to supercomputing architectures are on the horizon. The current simulation codes, that much of UK science relies on, are designed for current supercomputer architectures. These codes will not be able to fully exploit the power of the supercomputers that the mid-2020s will deliver, and could in fact run slower on those machines than they do now. Future computers will also be more energy efficient, so the sooner we adapt to newer technology, the greater the opportunity is for cost-savings.
Therefore, it is essential that we invest now in redesigning these simulation codes so that they perform well on future generations of supercomputers. ExCALIBUR is meeting this challenge by delivering research and innovative algorithm development to redesign high-priority simulation codes and fully harness the power of future supercomputers across scientific and engineering applications.
ExCALIBUR is built around four pillars – separation of concerns, co-design, data science, investing in people – and by applying the principles of the four pillars, successful ExCALIBUR programmes will research methodologies to redesign the use cases for supercomputers of the near future and beyond. A key criterion for the selection of use cases is the breadth of impact across a wide field and user base. By choosing use cases that both enable strategically important research and provide lessons that can be applied to other codes/fields, the programme will maximise the impact across the scientific modelling and simulation communities.
A high priority use case for exascale software development has the following characteristics:
- It provides a step-change in simulation performance and provides solutions that are not currently feasible, consistent with the enhanced performance of exascale computing.
- It enables high-quality, high-impact research in multiple areas of strategic importance.
- It produces applicable and scalable solutions that can be applied across a range of architectures, including non-exascale systems.
- It provides a national and international focal point for the relevant research communities, including the development of partnerships with complementary initiatives in the UK and internationally.
The Met Office, UKAEA and UKRI are leading on use cases to produce research software within our respective fields of study.
Weather and Climate Use Case
To apply the principles of ExCALIBUR to deliver the benefits to the Weather & Climate Prediction System
Component Model Co-Design applying the design principle of separation of concerns to critical components of the Weather and Climate Prediction System.
Weather and climate prediction systems are highly complex and comprise many different component models. Initialisation procedures are required to determine the starting conditions for each of the component models from observations (data assimilation). This itself needs the observations to be processed and quality controlled in a way that is consistent with the system’s characteristics. Allowance also has to be made for the uncertainty in those starting conditions as well as in the model components themselves. And then the data that the system produces needs to be managed to permit products such as forecasts to be visualised, verified, post-processed for products, and archived.
This work package is focusing effort on the larger, most vulnerable components. It will also seek to exploit commonalities of design in existing families of models. The activities in this work package are currently focused on:
- Optimising the data layout and memory access design of the new atmospheric model that integrates the equations governing how the air evolves;
- Improving the performance and flexibility of the chemical and aerosol solver which determines how the composition of the atmosphere evolves and how constituents are injected into, and removed from, the atmosphere;
- Agreeing and implementing a strategy for applying a separation of concerns approach to the NEMO family of marine system codes (the ocean model, ocean initialisation, and the sea-ice model);
- Researching and developing modifications to allow the wave model, which determines the state of the sea surface, to exploit shared memory architectures;
- Delivering a new framework for the atmospheric observation processing and data assimilation that is more flexible in being able to exploit new observation types and deployable across different architectures;
- Delivering a new framework for the verification and post-processing systems that is fit for future deployments of next generation models
With domain experts working with computational scientists and algorithmic experts, the research will explore:
- Co-design: Whether (and if so how) the existing algorithms need to be redesigned to make those algorithms more scalable (so that the cost of the algorithm increases linearly with any increase in the number of degrees of freedom). This will require exposing as much as possible those elements of a process that do not depend on other elements (removing unnecessary serial operations and replacing them with parallel ones) and implementing more inherently local algorithms.
- Separation of concerns: How to ensure that any changes needed to apply the algorithm optimally on alternative supercomputer architectures can be implemented in an automated manner. This requires the application of a Domain Specific Language (DSL). The desire is to have a DSL that is generic enough to encompass the requirements of this use case whilst remaining as specific as possible. The research will determine how achievable that desire is and, if it is not, then how limited can its relaxation be made (e.g. what is the fewest number or types of DSLs needed? How branched does one DSL need to be?).
System Co-Design co-designing the entire prediction system, including the redesigned components of the Component Model Co-Design and emerging ideas from the cross-cutting themes.
This work package will focus on research to ensure that when the component models are assembled (see System Integration below) the system as a whole is performant.
A key element of this work package will be the strategy for how to couple different components together: how and when that should be done; and how the coupler itself should be designed to be optimal for future architectures.
Many of the principles that apply to the individual components apply to the system as a whole. It will be important to ensure that any serial operations are absolutely necessary and that all other operations can be applied in parallel (thus allowing for as much task-parallelism as possible). Any communication between different components needs to be minimal which places constraints on how the system is designed and in particular how diagnostics and data output are handled. Since the relative costs of disk access versus the overhead of passing memory around is unclear (and possibly architecture dependent) it will be essential to have a system that is flexible in this regard. Similar considerations apply to the archiving of data from the system and to access of that data; it is inefficient if users have to manually optimise their code for different types of hardware; therefore a separation of concerns between the data and its storage medium should be researched. This might be characterised as ‘data location aware work flows’.
Many of these aspects are not specific to this use case; other use cases are likely to face the same challenges with the same solutions. Such aspects would be likely candidates for the cross-cutting themes (e.g. XC Work package 1).
This work package will therefore focus on redesigning those elements that are specific to this use case. Such elements could include the design and development of:
- The specific system (the data workflow) for this use case, in particular one that addresses any challenges presented by the use of the automatic code generation required by the DSL(s) and also the data workflow challenges presented by the ensemble system and cycling data assimilation.
- An optimal strategy for the coupling of the component models together that minimises the data dependencies between components.
- An optimal diagnostic system, and associated visualisation system, that retains flexibility and a good user interface while optimally manipulating data across vast numbers of processors and possibly combining data from different components.
- As few DSLs as possible while covering as many components of work package 1 as practicable
System Integration – implementing the System Co-Design using the components from the Component Model Co-Design. (Delivery not started yet.)
This work package will bring together all the elements of the others to deliver a working Weather and Climate Prediction System that is ready for the supercomputers of the mid-2020s. As well as the complex testing and trialling of the new system, an important element of this work package will be to stage the integration as much as possible so that where possible components of the existing system can be replaced in an incremental way. This approach will enable the benefits of the ExCALIBUR programme to start to be realised as soon as possible and ahead of its end date. Where that is not possible it will be important to devise a testing strategy (perhaps by developing appropriate sub-systems) to minimise the risk when various new components are finally brought together.
ExCALIBUR: Met Office science plan and activities
UKRI Use Case and Design and Development Working Groups
Use Case phase 1a
In April 2020, UKRI funded 10 design and development working groups (DDWGs). These groups cover a breadth of disciplines and were funded to develop a coherent community of practice comprising subject matter experts, RSEs, computational or mathematical scientists, and any other relevant individuals and groups enable effective engagement between the computational research community and research communities that can benefit from exascale computing.
The working groups are conducting a mixture of simulation code design and development, and community building activities to engage relevant computational and user communities with the following outputs:
- a strategic research agenda that clearly articulates the research challenges to be overcome, opportunities, key risks and mitigations, and sets out a detailed approach to addressing these to enable development of exascale-ready software by the mid-2020s.
- proof-of concept studies and research outputs.
- Collaborative approach engaging with a range of potential beneficiaries including co-design with industry.
Use Case phase 1b
A subset of these groups will be funded to continue research and development of a scientific code as a high priority use case and build upon their initial engagement with the wider research communities following a peer reviewed process; this will take place later this year. Each successful project will evolve and develop from the foundation of the work developed by the working groups. Each funded project will contribute to the overall programme, including but not limited to knowledge exchange activities, RSE development activities and content for the website.
Use Case phase 2
We are now beginning to use the lessons learned from these use cases to help develop the Use case phase 2 call.
This opportunity will benefit from the foundational work from the use case phase one projects and UKRI will be looking at focusing on areas of research that has not been funded in the first wave use case call.
The scope and remit for this call will be defined by the outcomes of Use Case 1b to ensure the programme takes a portfolio approach to capture the current landscape of exascale software research.
Fusion Modelling System Use Case
To apply the principles of ExCALIBUR to deliver the benefits to the Fusion Modelling System, project NEPTUNE (NEutrals & Plasma TUrbulence Numerics for the Exascale).
NEPTUNE is primarily aimed at the simulation of the plasma in the outer regions of tokamak confinement devices, including its interaction with solid wall.
Detailed modelling of solid wall geometry is relatively novel in the NEPTUNE case, and likely to require use of finite elements rather than the finite difference algorithms currently used. Numerical algorithms and libraries from related fields such as fluid dynamics and astrophysical plasmas are likely to be employed. An early task is to determine which is most suitable with the least development required for use for exascale-targeted plasma modelling. Early application of the software to the design of fusion reactors is envisaged, motivating immediate integration of techniques of Uncertainty Quantification (UQ) and Model Order Reduction (MOR) into the software design. In conjunction with finite elements, high-dimensional methods need also to be considered for plasma regions (which may change in time) with long mean free paths (mfps). High-dimensional methods will typically involve use of particles, although 5-D or 6-D finite elements will be considered. This leads to
Co-design: with experts
- in the writing and use of finite elements,
- numerical analysts to assist in the solution of the resulting large systems of equations, specifically in matrix preconditioning
- in particle methods and/or sampling in high-dimensional spaces
- in UQ and MOR notably in the use of surrogates to reduce computational expense including data movement
Plasma multiphysics model
Development of the software will proceed by the production of a series of proxyapps of increasing dimensionality and complexity. By multiphysics is initially implied continuum models. Although existing finite difference software may be used at first, the proxyapps are likely ultimately to be implemented using finite elements, which eventually will be integrated with particle models for neutral gas and impurities. The multiphysics models are sufficiently complex as to demand
Separation of concerns: such that the many different physical effects can be easily incorporated, ideally within the context of a high-level Domain Specific Language (DSL) in which partial differential equations (PDEs) are compactly expressible.
Co-design: with experts
- in the writing of DSLs
- in the use of pre-exascale hardware to solve finite element problems.
Neutral gas & Impurity model
A separate series of proxyapps will be produced to treat high-dimensional effects involving long mfps. These effects include not only neutral gas and impurities but also some ionised species typically located close to the wall. Ultimately, they will be integrated with the finite element proxyapps. This work will require
Co-design: with theoretical plasma physicists constructing high-dimensional plasma models, and experts in the use of particle codes on pre-Exascale hardware.
Data science: to manage the large amount of information in the high-dimensional approach, and integrate with the multiphysics proxyapps.
Code structure & coordination
This workpackage is intended to ensure best practice in scientific software production in the context of an opensource development for exascale, regarding particularly standards for documentation, traceability, verification and validation, and use of DSLs. Data structures will be designed to ensure the possibility of easy close integration between the models of different dimensionality. There is substantial overlap with the Component model co-design and System co-design packages of the Weather & Climate Prediction System. UKAEA work demands
Investing in people: to encourage RSEs to deploy, develop and learn appropriate software skills for exascale code developments, and specially for NEPTUNE, to acquire an understanding of the likely demands of the different physicist and engineering classes of user.
Separation of concerns: with experts in the use of relatively low-level machine portable software such as represented by SYCL and Kokkos.