Helmholtz AI consultants @ DKRZ

Helmholtz AI consultants @ German climate computing center

Tobias Weigel

Team leader

Earth and environment-focused AI consultants

The Helmholtz AI consultants for Earth and environment are located as a team in the application support department of the German climate computing centre (DKRZ). The team serves users from all the Helmholtz Association institutes to facilitate the application of artificial intelligence in Earth and environment. The team’s services include technical implementation support, such as data preparation and ingest, application scaling and roll-out on computing environments, training and knowledge exchange through courses, workshops and a growing knowledge base, and advising on AI methods, their limitations and biases. The team is dedicated to work collaboratively with scientific projects in close cooperation with individual researchers along their needs and will be directly included in collaborative projects.

Questions or ideas? consultant-helmholtz.ainoSp@m@dkrz.de

 

 

 

Team members

Tobias Weigel

Tobias Weigel

Helmholtz AI consultant team leader @ DKRZ

Tobias Weigel

Tobias Weigel

Helmholtz AI consultant team leader @ DKRZ

  • eScience infrastructures, PIDs and FAIR data, H2020 projects (EUDAT, EOSC), RDA
  • Data analysis and management services, workflow automation
  • Uncertainty quantification, community actions

Frauke Albrecht

Frauke Albrecht

Helmholtz AI consultant @ DKRZ

Frauke Albrecht

Frauke Albrecht

Helmholtz AI consultant @ DKRZ

Caroline Arnold

Caroline Arnold

Helmholtz AI consultant @ DKRZ

Caroline Arnold

Caroline Arnold

Helmholtz AI consultant @ DKRZ

  • Model design
  • ANN hyperparameter tuning

Jakob Lüttgau

Jakob Lüttgau

Helmholtz AI consultant @ DKRZ

Jakob Lüttgau

Jakob Lüttgau

Helmholtz AI consultant @ DKRZ

  • I/O and storage expert
  • Performance optimization

Felix Stiehler

Felix Stiehler

Helmholtz AI consultant @ DKRZ

Felix Stiehler

Felix Stiehler

Helmholtz AI consultant @ DKRZ

  • ML software engineering
  • Scalability and porting to GPUs
  • Data cleaning and preprocessing

Ongoing voucher projects

Atmospheric chemistry modeling with ML

  • Challenge: The Karlsruhe Institute of Technology (KIT) requested AI consultants' support to investigate the use of ANNs to emulate the results of an operationally used atmospheric chemistry modelling component (EMAC). The goal is to replace the computationally costly (PDE solving) model with a ANN component that is more efficient while providing results of comparable accuracy.
  • Approach: The consultants designed and implemented a full workflow including data processing and running a suitable ANN model at large scale (multiple GPU nodes). Time-intensive work involved uncovering and mitigating biases and quality concerns in the training data, experimenting with different ANN architectures and perform hyperparameter tuning on them.
  • Results: Initial results indicate that a ML model capable of performing required regression on a subset of all chemical variables is feasible, but significantly more training data is now available and required to reach quality of regression results comparable with EMAC, also to cover all relevant target variables under extreme conditions. Improving the scalability of the approach will need subsequent follow-up work, as will the long-term integration into an operational model run setup.

 

Extending the ML approach for GNNS reflectometry data to determining rainfall

  • Challenge: While the previous voucher on GNNS-R data concluded with good accuracy on determining wind speeds from the noisy signals, a further challenge is to also determine rainfall over oceans from the data. In principle, this may be possible because rainfall, just as wind, causes roughness of the ocean surface. However, the effect is much less pronounced, and particularly the combined occurrence of wind and rainfall is difficult to predict, and also side effects from disturbances at nearby locations need to be corrected.
  • Approach: The consultant team will help to further improve the ML model training and architecture, incorporating additional parameters and applying the model to additional data in order to tackle the challenges in training a network sophisticated enough to determine both wind and rainfall.

 

Guidance on ML usage to detect marine litter

  • Challenge: This voucher concerns a request from GFZ German Research Centre for Geosciences, where the goal is to detect marine litter from high-resolution satellite image data. While a preliminary ML application exists and the requesting researcher has substantial coding capabilities, consultants were asked to act as sparring partners to discuss details of the ongoing work and advice on more advanced ML techniques.
  • Approach: The consultant team will provide guidance on the methodological approach, address possible issues with data processing and biases, and discuss different strategies on how to train, improve and tune the network. 

 

Hereon: Detection and short-term forecast potential for rogue waves

In this voucher for Hereon, we investigate the potential of ML methods to detect and possibly predict rogue waves from sea surface height time series measurements.

 

FZJ: Knowledge provisioning for ML applications in Earth and Environmental science projects

Forschungszentrum Jülich (FZJ) requested support for several ongoing projects in the area of environmental modelling and hydrometeorology. Methodological concerns reach beyond purely data-driven approaches and include interpretability and uncertainty analysis.

 

FZJ: Technical support for AMBS using deep learning algorithms for weather forecasting

In this voucher shared collaboratively with and led by the AI consultants at FZJ, we contribute to tasks concerning the setup of performance profiling and tackling memory overflow issues. Further information is available on the FZJ consultants page.

 

GERICS: Large-scale groundwater level predictions with long-term short-term memory (LSTM) modelling

On request of the Climate Service Center Germany (GERICS), we are helping with initial model setup and methodological questions. The user is aiming to extrapolate groundwater levels spatially and temporally from global dynamic groundwater level time series and static information such as orography.

 

AWI: Deep learning-based Spatio-Temporal Interpolation of FESOM-derived Sea Surface Temperature fields

In this voucher, we are supporting a user from Alfred Wegener Institute (AWI) in the setup and performance optimization of an ANN implementation on GPUs.

 

GEOMAR: A Machine Learning approach to reconstruct fine-scale sea surface height maps

This voucher concerns the reconstruction of SSH data from satellite and synthetic data on request of GEOMAR. We support the users with feasibility assessment concerning potential use of GANs.

Completed voucher projects

Determining wind speeds from reflected GNNS data

  • Challenge: GFZ requested support to determine via regression wind speed over oceans from GNNS reflectometry data (CYGNSS mission data from 2018), starting from an early ANN prototype and improving it significantly in terms of accuracy and scalability beyond single GPUs.
  • Approach: We helped to find and address biases and noise in the data and developed a suitable data processing pipeline that can potentially also be re-executed for future data. The work on the data went hand-in-hand iteratively with ML model definition and implementation, improving the regression via iterative experimentation (focusing on CNNs), hyperparameter tuning and extension to better use of potentially multiple GPU nodes.
  • Results: The voucher concluded with results that show that an ML-based approach can beat the established (non-ML) methods in terms of prediction accuracy, with remaining uncertainty in severe weather conditions.

 

Addressing and distinguishing between model and data uncertainty

  • Challenge: UFZ requested support to tackle grand challenges in terms of quantifying data uncertainty from observational data products via running ML models to final analysis, distinguishing different forms and sources of uncertainty.
  • Approach: While originally planned as a networking exercise, with the goal to gain traction via at least one workshop, it became clear early on that a workshop could only be a second step following attention-raising through a dedicated, potentially provocative, high-level discussion paper publication. We drafted a suitable manuscript with the goal to raise awareness for the challenges to address potentially beyond the geoscience and ML communities.
  • Results: Needs for further networking and workshops were discussed but put on hold to be addressed following feedback from getting the publication out. The publication has been submitted at the end of the voucher.

 

Dynamic data loading and benchmarking for high-throughput GPU with WeatherBench

  • Challenge: In order to make efficient use of ML models in practical Earth System model runs, challenges in loading and streaming big climate data sets to ML applications need to be addressed. This voucher, requested by HZG, investigated the different stages where bottlenecks may occur in an operational model pipeline, from loading data from disk to CPU memory, transfer to GPU memory and final computation on GPUs in a cluster/HPC setup.
  • Approach: After investigating the balance between these optimization goals, the consultant team focused on optimizing the I/O throughput from disk to GPU memory based on the community-defined, WeatherBench benchmark that is representative for a wide range of practical cases. Work on the voucher included adapting the WeatherBench code to the HPC system, performing necessary data transformations, defining the full software stack, implementing it on an HPC system, performing a wide range of benchmarks and defining subsequent optimization strategies, all in consultation with the users.
  • Results: The voucher concluded with a full pipeline described and implemented that is required to run the WeatherBench scenarios at scale, scientifically valid benchmark results, and an outlook to write a publication based on the results.

 

Feature importance ranking for marine samples

  • Challenge: In this voucher, the consultant team was asked by GEOMAR to help with defining a suitable data processing pipeline for a marine science use case, performing a spatial regression of total organic carbon content of sea soil from a comparatively small number of ground-truth measurement points. An initial ML prototype already existed to perform the regression, but GEOMAR needed further insight into what exactly the model was doing in terms of feature importance and data quality aspects.
  • Approach: The consultants provided counsel on the methodological approach and performed feature importance rankings using Random Forests.

 

AWI: Stratospheric ozone modelling with ANNs

In this voucher, we support the Alfred Wegener Institute (AWI) with consulting on new approaches to model stratospheric ozone using ML based on SWIFT/ATLAS.