Helmholtz AI consultants @ DKRZ

Helmholtz AI consultants @ German climate computing center

Tobias Weigel

Team leader

Earth and environment-focused AI consultants

The Helmholtz AI consultants for Earth and environment are located as a team in the application support department of the German climate computing centre (DKRZ). The team serves users from all the Helmholtz Association institutes to facilitate the application of artificial intelligence in Earth and environment. The team’s services include technical implementation support, such as data preparation and ingest, application scaling and roll-out on computing environments, training and knowledge exchange through courses, workshops and a growing knowledge base, and advising on AI methods, their limitations and biases. The team is dedicated to work collaboratively with scientific projects in close cooperation with individual researchers along their needs and will be directly included in collaborative projects.

Questions or ideas? consultant-helmholtz.ainoSp@m@dkrz.de

 

 

 

Team members

Tobias Weigel

Tobias Weigel

Helmholtz AI consultant team leader @ DKRZ

Tobias Weigel

Tobias Weigel

Helmholtz AI consultant team leader @ DKRZ

  • eScience infrastructures, PIDs and FAIR data, H2020 projects (EUDAT, EOSC), RDA
  • Data analysis and management services, workflow automation
  • Uncertainty quantification, community actions

Frauke Albrecht

Frauke Albrecht

Helmholtz AI consultant @ DKRZ

Frauke Albrecht

Frauke Albrecht

Helmholtz AI consultant @ DKRZ

Caroline Arnold

Caroline Arnold

Helmholtz AI consultant @ DKRZ

Caroline Arnold

Caroline Arnold

Helmholtz AI consultant @ DKRZ

  • Model design
  • ANN hyperparameter tuning

Danu Caus

Danu Caus

Helmholtz AI consultant @ DKRZ

Danu Caus

Danu Caus

Helmholtz AI consultant @ DKRZ

  • Machine learning
  • Deep learning
  • Software engineering

Ongoing voucher projects

KIT: Modelling chemistry tracer transport with ML

  • Challenge: Modelling the dispersion of atmospheric aerosols in state-of-the-art models such as ICON-ART causes a significant increase in the computational costs of running such models, for example for operational weather prediction. Using ML models to emulate tracer transport could reduce such costs substantially, yet the effect on model output quality under different scenarios needs to be evaluated.
  • Approach: A prototype ML model that uses CNNs has been developed and should be further improved as part of this voucher. The current model setup causes spatial artefacts and low accuracy at higher resolutions. The architecture needs to be improved, and further gains can be achieved by tuning its hyperparameters.
  • Results: The ML model under development could be further improved in terms of accuracy and reduction of spatial artefacts. Future work could progress beyond the scope of the motivating thesis, subject to further discussion.

Guidance on ML usage to detect marine litter

  • Challenge: This voucher concerns a request from GFZ German Research Centre for Geosciences, where the goal is to detect marine litter from high-resolution satellite image data.
  • Approach: The consultant team will provide guidance on the methodological approach, address possible issues with data processing and biases, and discuss different strategies on how to train, improve and tune the network. 
  • Results: The model’s software architecture was improved by migrating it to pytorch lightning and tuning its performance. Several approaches were tested and implemented to improve accuracy and cope with the limited number of labels. Automated hyperparameter tuning was set up and used to further optimize the model. The model now achieves acceptable accuracy within the scope of the underlying project workflow. Further improvements may be reached by experimenting with semi-supervised learning techniques, increasing the number of labels and leveraging additional channel data.

Extending the ML approach for GNNS reflectometry data to determining rainfall

  • Challenge: While the previous voucher on GNNS-R data concluded with good accuracy on determining wind speeds from the noisy signals, a further challenge is to also determine rainfall over oceans from the data. In principle, this may be possible because rainfall, just as wind, causes roughness of the ocean surface. However, the effect is much less pronounced, and particularly the combined occurrence of wind and rainfall is difficult to predict, and also side effects from disturbances at nearby locations need to be corrected.
  • Approach: The consultant team will help to further improve the ML model training and architecture, incorporating additional parameters and applying the model to additional data in order to tackle the challenges in training a network sophisticated enough to determine both wind and rainfall.
  • Results: The model has been extended and updated to cover additionally available data and first strategies have been tested to account for rainfall, deal with gaps in the data and improve the performance on extreme values. Work continues towards a model that works well in such cases and can predict rainfall at acceptable accuracy.

Modelling seismicity in geothermal reservoirs

  • Challenge: Human-induced seismicity due to geothermal energy production can pose an environmental risk. Understanding the mechanisms underlying such induced seismicity, for example, the relationship between fluid pressure and resulting seismic activity, can help to make such risks more transparent and addressable.
  • Approach: Several possible data sources exist including lab data from comparatively controlled environments that may be upscaled to the scope required in practical applications and long-term observational data from geothermal fields. In terms of ML techniques, LSTM models or transformers may be applied to cover temporal aspects, but also spatial attributes could be of relevance. Understanding underlying causes and mechanisms may require a workflow design that affords explainability.

Completed voucher projects

Atmospheric chemistry modeling with ML

  • Challenge: The Karlsruhe Institute of Technology (KIT) requested AI consultants' support to investigate the use of ANNs to emulate the results of an operationally used atmospheric chemistry modelling component (EMAC). The goal is to replace the computationally costly (PDE solving) model with a ANN component that is more efficient while providing results of comparable accuracy.
  • Approach: The consultants designed and implemented a full workflow including data processing and running a suitable ANN model at large scale (multiple GPU nodes). Time-intensive work involved uncovering and mitigating biases and quality concerns in the training data, experimenting with different ANN architectures and perform hyperparameter tuning on them.
  • Results: Initial results indicate that a ML model capable of performing required regression on a subset of all chemical variables is feasible, but significantly more training data is now available and required to reach quality of regression results comparable with EMAC, also to cover all relevant target variables under extreme conditions. Improving the scalability of the approach will need subsequent follow-up work, as will the long-term integration into an operational model run setup.

Determining wind speeds from reflected GNNS data

  • Challenge: GFZ requested support to determine via regression wind speed over oceans from GNNS reflectometry data (CYGNSS mission data from 2018), starting from an early ANN prototype and improving it significantly in terms of accuracy and scalability beyond single GPUs.
  • Approach: We helped to find and address biases and noise in the data and developed a suitable data processing pipeline that can potentially also be re-executed for future data. The work on the data went hand-in-hand iteratively with ML model definition and implementation, improving the regression via iterative experimentation (focusing on CNNs), hyperparameter tuning and extension to better use of potentially multiple GPU nodes.
  • Results: The voucher concluded with results that show that an ML-based approach can beat the established (non-ML) methods in terms of prediction accuracy, with remaining uncertainty in severe weather conditions.

Hereon: Detection and short-term forecast potential for rogue waves

  • Challenge: In this voucher for Hereon, we investigate the potential of ML methods to detect and possibly predict rogue waves from sea surface height time series measurements.
  • Approach: The provided time series data was explored and statistically analyzed to get a hold of the postulated signal and underlying noise. Several models were tested, including data processing with FFTs, different chunking strategies and reformulating prediction targets to steer the underlying ML model to a sufficiently precise question. In terms of network architectures, LSTMs have been mostly employed, but also some early experiments with transformer models were conducted.
  • Results: Detection of rogue waves with an ML model has been shown to work in principle. For prediction of rogue waves, several experiments were conducted and different methods and data processing strategies tested. Additional data and further experimentation may be required to come to a model with practically acceptable accuracy and reliability under changing data input conditions.

Addressing and distinguishing between model and data uncertainty

  • Challenge: UFZ requested support to tackle grand challenges in terms of quantifying data uncertainty from observational data products via running ML models to final analysis, distinguishing different forms and sources of uncertainty.
  • Approach: While originally planned as a networking exercise, with the goal to gain traction via at least one workshop, it became clear early on that a workshop could only be a second step following attention-raising through a dedicated, potentially provocative, high-level discussion paper publication. We drafted a suitable manuscript with the goal to raise awareness for the challenges to address potentially beyond the geoscience and ML communities.
  • Results: Needs for further networking and workshops were discussed but put on hold to be addressed following feedback from getting the publication out. The publication has been submitted at the end of the voucher.

Dynamic data loading and benchmarking for high-throughput GPU with WeatherBench

  • Challenge: In order to make efficient use of ML models in practical Earth System model runs, challenges in loading and streaming big climate data sets to ML applications need to be addressed. This voucher, requested by Hereon, investigated the different stages where bottlenecks may occur in an operational model pipeline, from loading data from disk to CPU memory, transfer to GPU memory and final computation on GPUs in a cluster/HPC setup.
  • Approach: After investigating the balance between these optimization goals, the consultant team focused on optimizing the I/O throughput from disk to GPU memory based on the community-defined, WeatherBench benchmark that is representative for a wide range of practical cases. Work on the voucher included adapting the WeatherBench code to the HPC system, performing necessary data transformations, defining the full software stack, implementing it on an HPC system, performing a wide range of benchmarks and defining subsequent optimization strategies, all in consultation with the users.
  • Results: The voucher concluded with a full pipeline described and implemented that is required to run the WeatherBench scenarios at scale, scientifically valid benchmark results, and an outlook to write a publication based on the results.

Flood and drought risk assessment with ML

  • Challenge: Understanding the driving factors causing large-scale flood and drought events, both from a historical perspective but also under present climate change, is of high relevance to society and adaptation strategies. Users sought advice on the general feasibility and potential implementation with ML in view of available funding opportunities.
  • Approach: We assessed available data sources and evaluated a range of ML techniques, including boosted trees and VAEs, according to their suitability, risk vs. gain and potential implementation costs in view of the envisioned project work.
  • Results: A suitable strategy was proposed and could be included in proposal applications.

GERICS: Large-scale groundwater level predictions with long-term short-term memory (LSTM) modelling

  • Challenge: Future changes in groundwater availability can have a dramatic impact on local water supply for agriculture and human well-being, particularly in dry areas affected by climatic changes. Being able to better predict future groundwater levels at seasonal scale under possibly changing climatic conditions can provide insights valuable for regional decision-making and improve our understanding of the water cycle dynamics.
  • Approach: Time series data from a global groundwater database and additional context data such as catchment topography and local precipitation will be used to train Artificial Neural Networks. In particular, LSTM-based ANNs will be used as they can model the time dependencies well. The goal is to predict future groundwater level changes and becoming able to distinguish between climatically induced changes and anthropogenic influences. The consultants provide support to the user concerning the technical setup, performance improvements, model architecture and methodological questions. 

GEOMAR: A Machine Learning approach to reconstruct fine-scale sea surface height maps

  • Challenge: Acquiring precise measurements of sea surface heights (SSH) at global scale, high spatial resolution and close to real time is a grand challenge for ocean monitoring. Acquiring such data is the goal of several past and upcoming Earth observation missions. The estimation of precise SSH data from satellite and synthetic data with ML methods has been requested by this voucher for GEOMAR.
  • Approach: Given available data, it should be evaluated which ML methods may be best suited, particularly for some first promising experiments, and how the overall workflow should look like.
  • Results: A preliminary data processing and model workflow and the general feasibility of such a solution have been discussed and the availability of synthetic data with high spatial and temporal coverage is a promising factor for success. Suitable candidates include GANs and VAEs, but also transfer learning with the challenge that ground truth data will continue to be spatially sparse. Further simulation data needs to be produced and pre-processed by the requestors before experimentation with model architectures at scale can commence.

Feature importance ranking for marine samples

  • Challenge: In this voucher, the consultant team was asked by GEOMAR to help with defining a suitable data processing pipeline for a marine science use case, performing a spatial regression of total organic carbon content of sea soil from a comparatively small number of ground-truth measurement points. An initial ML prototype already existed to perform the regression, but GEOMAR needed further insight into what exactly the model was doing in terms of feature importance and data quality aspects.
  • Approach: The consultants provided counsel on the methodological approach and performed feature importance rankings using Random Forests.