On 24 – 25 June 2024, the Centre for Theoretical Cosmology (CTC) hosted AMReX at Exascale, a workshop to explore the themes of open platform programming, with a particular interest in adaptive mesh refinement (AMR) methods, in preparation for exascale computing. This workshop was organized under the ExCALIBUR project In-situ visualization and unified programming across accelerator architectures, and provides a training event for the purposes of knowledge exchange regarding in-situ visualization of large simulation codes.
The workshop was jointly funded by the ExCALIBUR Exascale Computing Program (ECP) exchange programme (80%) and Intel (20%), as the CTC is an Intel OneAPI Centre of Excellence. This report summarizes the outcomes of our meeting, in light of our objectives and any lessons learned for the ECP exchange programme. Presentations and the full agenda can be found on the meeting website.
Context
The In-situ visualization and unified programming across accelerator architectures project has two main aims:
- Porting of the numerical relativity code, GRChombo, to be performant on accelerated architectures such as GPU systems in preparation for such an exascale system. Our new code, named GRTeclyn, is based on the AMReX framework for AMR. AMReX already contains support GPU offloading via a light abstraction layer that allows for NVIDIA, Intel and AMD GPUs kernel launches.
- Introduction of in-situ visualization techniques to GRTeclyn for on the fly simulation analyses and knowledge exchange purposes. This is particularly salient for exascale computing because the time taken to read and write data to disk can form a significant portion of total runtime more so than the actual computation. Furthermore, visualization and analysis resources are often an order of magnitude or more smaller than the actual cluster used to complete the simulation. Both of these objectives would benefit from establishing a connection with the US ECP community, as AMReX was an ECP supported project and many large scale simulations would also encounter similar issues regarding I/O as GRTeclyn. Furthermore, the US HPC community is more experienced with the challenges of exascale computing having had access to two DOE exascale machines, Aurora and Frontier, for approximately a year
The workshop
In keeping with our project objective to explore different programming models across various accelerated architectures, we set a theme of Open platform programming” for the first day of the event. The morning talks/demonstrations were dedicated to SYCL (one of programming models supported by Intel’s oneAPI suite) and the afternoon for Kokkos, which is an abstraction layer allowing device offloading across all major GPU architectures,
which itself is an ECP project.
Some highlights included our invited speaker, Steve Rangel (Argonne National Laboratory), who presented his experience of porting HACC (Hardware Accelerated Cosmology Code, an N-body solver) from CUDA to SYCL. Steve showed that it was possible to achieve substantially better (2x) performance on the Intel Data Center GPU Max 1550 series as NVIDIA A100 series provided that some hardware specific optimizations had taken place. Tom Deakin (University of Bristol) presented an overview of open platform programming methods and with a comparison of openMP device offloads, to SYCL to Kokkos. Kacper Kornet (Research Computing Services, Cambridge) gave an introduction to our local system Dawn (which will become a DRI system in 2025) and supplied attendees with training accounts for the duration of the meeting. In the afternoon, Daniel Arndt (Oakridge National Laboratory) provided an introduction to Kokkos and his implementation of the SYCL backend within Kokkos, including the SYCL to Kokkos equivalent functions. One of the major concerns for open platform programming is the introduction of additional overhead at runtime as opposed to writing a device specific code. He showed that using these SYCL backends in Kokkos were not less performant than writing in SYCL itself.
The theme for the second day was AMReX (and other AMR based codes) as well as large scale visualizations for AMR simulations. Weiqun Zhang (Lawrence Berkeley National Laboratory), the lead developer of AMReX, spoke about how AMReX is able to achieve performance portability with its lightweight abstraction layer that closely resembles Kokkos. Weiqun showed that a third of ECP codes use Kokkos, while another 21% have their own framework (some of which are AMReX based), which supports our own preference for open platform programming and it’s importance for exascale computing. He also spoke about the development of AMReX and that Aurora was initially planned to be a CPU only machine, which is why AMReX has always supported an MPI + openMP + GPU programming model. Zarija Lukic (Lawrence Berkeley National Laboratory) spoke about Nyx, a hydrodynamical N-body code for cosmology and how to incorporate particle structures within AMReX in order to make predictions to support Lyman-α observations for DOE Stage IV experiments such as the Dark Energy Spectroscopic Instrument. Zarija also mentioned the importance of on-the-fly calculations and in-situ analyses: I/O is such an significant bottleneck on Frontier that he is unable to complete a scientifically useful simulation that uses the entire cluster, despite Nyx showing good scaling up to the full machine.
In the afternoon, we had our visualization session, which included our demonstration of the in-situ visualization capabilities in AMReX and GRTeclyn. This formed a part of our ExCALIBUR workplan, which includes a component for knowledge exchange and support for other community codes. This session also featured Carson Brownlee (Intel), who presented a distributed ray tracing algorithm for AMR data using OSPRay, called Wombat, which was done in collaboration with local CTC RSEs and researchers. Joe Insley and Silvio Rizzi (both from Argonne National Laboratory) completed this session with examples of large scale (>1000 node) visualizations that were done on Aurora. While some of these were made for educational purposes and publications, they also mentioned that seeing the data displayed in different ways (for example for the growth of structure measurements in redshift space) had helped to enhance the researchers’ understanding of the underlying physical phenomena.
Meeting outcomes and impact
This meeting was scheduled immediately before the GRTL users meeting, and we were therefore able to provide hands-on training for new features in GRTeclyn for around 30 current GRChombo users. This included running a small example on Dawn to showcase the GPU abilities of GRTeclyn as well as an exercise on porting a short section of code from GRChombo. Participants were surprised at how quickly calculations can be modified to be compatible with AMReX, given the current framework that had already been ported. A general consensus was reached that the collaboration should be moving to GRTeclyn and a few users have started to transition over which is great.
It was very helpful to have everyone together in the room, and in discussion with Aurora users and in light of Steve Rangel’s talk, we were able to formulate a plan for optimizing GRTeclyn for better performance on Intel GPUs. This is work that we are currently looking and, using Intel Advisor to identify hot spot kernels which can then be rewritten with inline vISA if necessary. As part of this the plan is to work with AMReX and Nyx developers who are currently porting their applications to Aurora. Furthermore, Weiqun Zhang was able to resolve a major bug in the interpolation between AMR levels which now means GRTeclyn is operating at a higher level of accuracy than GRChombo for an equivalent problem. On the back of this, he also introduced tracer particles into our framework to tag the location of black holes for further refinement (puncture tracking) and the AMR interpolator which is a key requirement for measuring the gravitational wave signal and other observables, thus this workshop led to direct improvements and new capabilities.
Conclusions
The ECP exchange programme has been incredibly fruitful for us, not only in terms of developing our codebase but also for furthering ties to related projects in the US that are mutually beneficial. Many of the challenges raised by the ECP projects will also be relevant for ExCALIBUR, as issues surrounding the stability of these exascale systems, for instance as Aurora is still in acceptance phase, and the mismatch between I/O and compute speeds remain unsolved. Furthermore, many of these discussions and debugging sessions were more productive in person, since some problems that were resolved as well as collaborations that were enabled came from unexpected sources. Some of the solutions would never have been developed at all because the questions would never have been proposed to the same group of people over Slack or email.
Credit to Stephen Blair Chappell as the photographer