ARM+GPU Demonstrator

The aim of this testbed is to ensure that ExCALIBUR and the wider UK research community has access to an ARM based GPU testbed. It is clear that GPUs will play a central role in the Exascale era and that Nvidia GPUs are currently the most popular and widely deployed GPUs in HPC. Nvidia’s proposed merger with ARM Ltd means that ARM based GPU systems are likely to become a key platform for Exascale systems, and therefore it is essential that UK researchers have early access to these systems.

The ARM GPU testbed builds on the University of Leicester’s existing experience managing ARM based HPC systems, our Software engineering team’s CUDA expertise, and our existing relationship with the technical teams at ARM/Nvidia.

The ARM GPU testbed system is based on the Gigabyte G242 platform, commissioned for NVidia’s “HPC Development Kit” programme.   The testbed comprises four G242 servers, each server has a single Ampere Altra Q80-30 CPU and 2x Nvidia A100 GPUs (40GB), backed by ~100TB NVMe storage system running BeegFS.

The project includes Research Software Engineering (RSE) effort for the porting, benchmarking and development of existing codes to support the programme of work described above.  The project will help to ensure that ARM servers work harmoniously with Nvidia GPUs, and that any shortcomings are understood, documented and reported back to vendors.   The RSE will contribute to the creation of digital assets including progress reports, whitepapers and how-to documents, as well as software enhancements and modification.   

