ExCALIBUR Framework for application input configuration and validation 

Next-generation weather prediction and climate projections rely on complex systems using a variety of codebases owned, developed, and maintained by different (but overlapping) communities. Each of these communities uses their own approach to develop and configure their code and its input and uses their own coding standards. The correct configuration and development of an application and its inputs using any one of these code bases can rely on detailed domain-specific knowledge. Whilst experts are expected to have this knowledge for their domain, it’s difficult for developers and users of these systems to acquire this level of expertise across many applications owned by different communities. 

Rules written by application specialists in the form of schemas (also known as configuration metadata) can be used to ensure that user-defined inputs to applications are both appropriate and consistent. This includes (but is not limited to) what inputs are required, how these depend on the values of other inputs and that each input is of the correct data type and length. Such schemas can then be employed to aid users not experienced in these applications by constraining inputs and providing description, help, application default values, etc. Use of schemas in current systems is not universal and they are often written manually to be consistent with the code base, an approach which is hard to maintain and error prone. 

One way to address this being widely adopted by next-generation codes is for the metadata contained in schemas to be automatically derived from single sources of truth, either using information held in the code bases to constrain the inputs or using separate schemas to both constrain the inputs and generate the code used to read and validate it. Whilst we can influence the developers of the code bases that we use to adopt a single source of truth, we must accept that the way they wish to do this will be different between different code bases. The challenge therefore is to be able to use the schema generated from these different sources of truth as part of a common user experience. 

We have enhanced the next generation Data Assimilation (DA) applications with the ability to generate JSON schema files for their input configurations as part of the build. Workflow developers can take advantage of the YAML processor utility which we developed as well as the enhancement we have added to the DA applications. They can modularise their application configurations and build them from static, user inputs and cycle dependent settings. Where required, they can expose settings in YAML configurations to users via the current generation configuration editor user interface as environment variables settings. They can also use the schema validation services to identify configuration issues as part of the workflow. 

Users can also take advantage of DA application JSON schemas if they choose to use programming text editors / IDEs that support the YAML language server, by making small changes to their YAML files to point to the relevant schemas. This working practice is portable to Jupyter. This is a modern way of working that enhances robustness and usability for users at all levels of expertise. 

The approach is, in general, applicable to any software systems configured by YAML files (and related file formats) that can be associated with a standardised schema. We therefore recommend the approach to domains beyond the Weather and Climate Use Case, provided they can have a similar setup.