The UCL Adaptable Cluster Project

The ExCALIBUR Interconnect Demonstrator consists of two non-blocking interconnect fabrics supporting up to 60 attached nodes in a dual fabric configuration. One fabric is 200 Gbps HDR Mellanox Infiniband configured so that it is possible to construct multi-hop routes between nodes.

The second fabric is 100Gbps Mellanox Ethernet, with BlueField adaptors on each node. This allows us to measure the impacts of a variety of in-network technologies – doing computation at the switch level (requiring multiple hops) and looking at the possibility of using acceleration on the adaptor to off-load some of the work of the host machine (the BlueField cards). We also aim to compare “state of the art” in using Ethernet as an Interconnect with Infiniband to measure whether on RDMA on Converged Ethernet has reached the point where it is a performant, cost effective interconnect.

In order to understand system and application performance the Adaptable Cluster collects metrics from several sources in the system and dashboards to visualise them, which then allow focus on how to improve system design and resource usage. Alerts can be set up to draw attention to performance issues as well. The testbed uses components such as Elasticsearch, Kibana, Logstash and Prometheus to provide insights into both breadth and depth of system and application performance.

UCL is the location of the ExCALIBUR instance of the ARM FORGE Application. This is an application that supports the debugging, profiling and optimisation of codes that use distributed resources, such as a cluster. It is both CPU and GPU enabled. UCL will support ARM FORGE for key centres in the ExCALIBUR project. It will also be available to UCL projects that are not associated with ExCALIBUR. This package enables jobs that use up to 2048 cores to be analysed in terms of code efficiency. One outcome of this project will be methodologies that enable results from Prometheus and ARM Forge to be used to improve system design, architecture performance and application performance.

Access to the testbed systems

The Excalibur H&ES testbeds are prioritised for access by ExCALIBUR projects, but also available for use by the wider UK research community – contact the ExCALIBUR H&ES programme office to discuss your requirements. Please note that the testbeds are offered on a best efforts basis rather than a service footing, as befits their experimental status.

Latest news