Benchmarking for AI for Science at Exascale (BASE)

With ever increasing volumes of data from large-scale experiential facilities and observatories, and the impending arrival of the world’s first exascale supercomputers, there is a clear need for AI solutions that are scalable (as a powerful, modern technique for big data reduction and interpretation).

The performance and efficacy of machine learning systems, in particular deep learning systems, now exceeds human level capability around a rapidly expanding number of tasks, such as object recognition and classification or for anomaly detection across a range of complex engineering systems. At the same time however, the rate at which the scientific community is producing data, through large-scale experimental facilities and observatories, is increasing at an exponential rate, thanks to the latest developments in sensor and storage technologies. With this burgeoning growth in the volume of data that our science communities need to assimilate and mine, difficulties with the current generation of hardware/software ecosystems for big data analytics have become commonplace. Preparing AI technology for the exascale (and co-designing exascale hardware for the AI algorithms themselves) has therefore become an urgent requirement and internationally recognised endeavour.

Although AI Benchmarking is becoming a well-explored topic, several issues are still to be addressed, including, but not limited to:

  • The fact that there are currently no efforts aimed at AI benchmarking that targets exascale hardware and capability, particularly for science.
  • A range of scientific problems involving real-world large-scale scientific datasets, such as those from experimental facilities or observatories, are largely ignored in benchmarking activities.
  • Gap analysis across UK science indicates that benchmarks are needed to serve as a catalogue of techniques offering template solutions to different types of scientific problems.

Whilst scoping the development of an AI benchmark suite, this working group aims to address these issues and opportunities. The benchmark initiative will focus upon removing noise from images – a common issue across multiple disciplines.

The working group is engaging in two parallel activities. One is to build a community across multiple scientific disciplines for synthesising an overall scope for developing AI benchmarks. The second is to develop and evaluate an example benchmark, using the chosen noise filtering challenge as the example problem, across three scientific disciplines. The use cases and disciplines we have selected are a) removing noise from cryogenic electron microscopic (Cryo-EM) datasets (life sciences), b) X-Ray tomographic images (material sciences), and c) weak lensing images (astronomy).

Some of the challenges described above will be addressed through a set of community engagement activities:

  • A one-week study group on creating an example benchmark for noise filtering
  • A two-day, domain-specific workshop
  • A two-day, cross-domain workshop
  • A two-day, evaluation workshop.

BASE-II

Following the success of the BASE-I project, BASE-II will address identified challenges and requirements in developing AI for Science solutions at exascale. SciML bench, released as part of BASE-I, provided the scientific community with examples and templates for several challenging problems from different research areas. However successful, benchmarks alone cannot be the only solution for the AI for Science community’s challenges. Having gathered feedback from the community, a core set of requirements were identified:

  • AI Benchmarking
  • AI/HPC Convergence
  • AI Hardware/Software Co-Design
  • Learning from Largescale datasets
  • AI at Exascale Toolbox

BASE-II aims to develop a suite of exascale-ready software and relevant designs for addressing these highly prioritized requirements from the AI for Science community — Blueprinting AI for Science at Exascale.

We will ensure that our deliverables remain relevant to UKRI’s e-Infrastructures, and to the communities, through tight engagements with various ExCALIBUR-funded projects, industries, various user bases, academia, national laboratories, and international organisations. In addition, knowledge exchange activities will underpin the maximum flow of information between relevant communities, leading to our success.

For more information, please visit the BASE-II website.