Weigel Team - Helmholtz - Gemeinschaft deutscher Forschungszentren

Team leader

Team leader

HELMHOLTZ AI CONSULTANTS @ HELMHOLTZ CENTER HEREON

EARTH AND ENVIRONMENT-FOCUSED AI CONSULTANTS

The Helmholtz AI consultants for Earth and environment are hosted at the Institute for Coastal Research of Helmholtz Center hereon. The team serves users from all Helmholtz Association institutes to facilitate the application of artificial intelligence in Earth and environment. Services of the team range from giving general guidance on ML methods to prototyping and deployment on HPC. They can be requested via the Helmholtz AI voucher system and are usually shaped in the form of collaborative sprint projects.

We support the full range of research questions related to Earth and environment, including:

Using ML for Earth System modelling, e.g., building ESM-ML hybrids, ML-driven model parameterizations, model output post-processing with ML.
Using ML for environmental research, climate services and adaptation, e.g., pattern recognition and generative AI for imagery and time series, ecosystem and hydrological modelling.
Use of ML in geology and seismology, e.g., in analysis of measured or simulated seismic waveforms, long-term terrestrial modelling, hazard risk assessment and early warning systems.
Practical use of ML, e.g., operationalization of ML models (MLops), integration with cloud services, support for transfer projects.

We also develop and deliver specialized training courses in ML for domain scientists.

Questions or ideas? Feel free to reach out!

Tobias Weigel
- tobias.weigel@hereon.de
Caroline Arnold
- caroline.arnold@hereon.de
Danu Caus
- danu.caus@hereon.de
Harsh Grover
- hash.grover@hereon.de
Paul Keil
- paul.keil@hereon.de

SELECTED ONGOING VOUCHER PROJECTS

Our most prominent current challenges.

Challenge: Regional-scale pollutant concentrations such as PM2.5 and NOx are acquired through urban measurement networks. Spatially continuous reanalysis datasets such as CAMS offer only low spatial resolution that leaves many questions of relevance to urban planning open, including aspects of industrial and marine infrastructure. In this project, the consultants were tasked to combine these datasets in order to produce accurate, high-resolution maps of urban pollutant concentrations.

Approach: Several methods for such a pseudo-downscaling approach were evaluated, including boosted trees via XGBoost and Gaussian Processes, based on the sample case of data from the city of Hamburg. First results obtained with tree-based modelling are promising, while GP-based modelling proved to be more complex. Further work will be spent on validating model performance using independent measurement data and evaluating potential transferability to other cities.

Results: First results were presented at the EGU conference 2025. Work is continuing improving approaches, with the perspective to summarize methods and results in a journal article and potentially seek additional funding.
Challenge: This voucher lends conceptual, training and implementation support to a Helmholtz AI project. In this project, proxy data records, paleo and runoff simulations are employed to model off-shore groundwater resources. This rich yet diverse assembly of input data and models presents a challenge for integration into a coherent ML-based modelling approach.

Approach: We provide consulting to the project through frequent calls, implementing example cases for training and experimenting with selected input data to give recommendations on suitable approaches for the complex ML pipeline the project requires. ML methods employed include CNNs, Bayesian Neural Network and ensemble learning approaches.
Challenge: A webpage-based planning and information tool for Baltic sea stakeholders was developed in a project. Generative AI offers potential to uplift this resource to become more interactive and potentially include more datasources, increasing the quality of insights to be delivered to stakeholders.

Approach: The consultants are tasked with an evaluation of possible methods that can be applied to the case, offering either an alternative approach to the webpage or enhancing it with AI features. Evaluation targets mechanisms such as RAG and MCP-based tooling. Specific implementation challenges lie in matching the existing tool workflows to dialog-style interaction as enabled through a chatbot and ensuring that the underlying data sources such as ArcGIS datasets and processing capabilities are adequately used. The consultants are tasked with designing a proof of concept that may serve as keystone to further funding.
Challenge: Hydrological forecasting that relies on numerical modelling is computationally expensive. AI-based surrogate modelling offers the opportunity to provide forecasts at much better reaction times that are relevant for the agricultural sector. In this project, the consultants are tasked with implementing a fast surrogate model able to provide hydrological forecasts for Germany.

Approach: Possible approaches have been recently published by other groups, and the consultants have evaluated these against the case at hand. A concrete implementation project is currently ongoing, leveraging 3D forced spatiotemporal RNNs. Exemplary implementation challenges include the size of datasets that require advanced parallel I/O and training approaches, choice of metrics and patching strategies.

Results: First results show that the AI-based solution is capable of fulfilling the basic requirements with very promising potential for high quality, coverage and potential transferability to other regions across Europe.
Challenge: The detection of marine mammals such as whales is important for both scientific study and marine protection. 360° infrared cameras mounted aboard research vessels or marine infrastructure installations are data sources that potentially offer automated detection. However, the current procedures still rely on a significant amount of manual effort and are prone to misdetections and do not work well in adverse conditions.

Approach: Given an existing data processing and analysis pipeline that has been in operation for many years, the consultants are tasked with enhancing this pipeline with modern computer vision methods, with the goal to improve detection accuracy and response time within the technical limits of offshore or ship-based computing capabilities and practical requirements. The consultants have evaluated several approaches using well-established ML solutions and tuned them given the available data. As the data features several challenging properties such as limited sample size, effects of distance to camera and a variety of measurement and environmental conditions, extensive experimentation is required.

Results: Investigation has revealed several methods that can fulfill the initial requirements, with important trade-offs to consider. Overall, the problem remains limited by the available data, yet a solution of acceptable quality that is a significant improvement over the state of the art has been achieved. A publication is in preparation.

SELECTED COMPLETED VOUCHER PROJECTS

Project highlights of our previous work.

Challenge: Machine learning models trained on different datasets, even with slight variations, may perform inconsistently, complicating result comparison and validation. Drift detectors are tools designed to identify significant changes in the statistical properties of data, which can significantly impact model predictions. Detecting these changes is crucial for maintaining the accuracy and reliability of models, especially in dynamic environments. Enhancing drift detectors with robust monitoring and alerting mechanisms is essential for ensuring the long-term robustness of machine learning models. These enhancements help in promptly addressing data shifts, thereby supporting consistent model performance and improving the overall reliability of results over time.

Approach: In this voucher, we are tasked to implement drift monitoring as a service for the European Open Science Cloud (EOSC). Drift Monitoring as a service is a comprehensive solution designed to track and visualise drift in machine learning models. The service offers secure authentication via MyToken, supports both concept and data drift detection, and provides a web-based dashboard for intuitive result interpretation. This project also included developing a CI/CD pipeline ensuring continuous integration and deployment of updates, while the alerting system proactively notifies users of potential drift issues. This all-in-one approach makes drift-monitoring a versatile tool for easy integration into existing ML pipelines and for maintaining ML model performance throughout its lifecycle, from development to production deployment thus ensuring models remain accurate and reliable over time.

Results: A Proof-of-Concept was developed that runs a containerized drift detection pipeline on sample data, stores results in a backend via FastAPI and displays them using a Streamlit UI, with support for remote execution via GitHub Actions. The voucher outcomes were presented at Helmholtz AI and EGI Conferences in 2024 and 2025 and included in AI4EOSC web services.
Challenge: Atmospheric observation stations provide long-term, but spatially sparse time series data. The goal of this voucher is to employ ML methods to reconstruct spatially continuous, gridded fields from temperature observations, as well as additionally relevant proxy inputs and simulations. Doing so will provide critically needed input data for subsequent scientific analysis, forecasting and simulation experiments.

Approach: Gaussian Processes (GPs) lend themselves to this setting as they can also work with limited amounts of data and provide uncertainty quantification. We successfully implemented a scalable spatio-temporal GP model, trainable on GPUs and are currently experimenting with additional techniques to reduce model complexity and increase data throughput.

Results: The GP model was implemented and scaled up using GPUs. Intensive work has been done to ensure that the results are correct in both time and space, which is far from trivial. The workflow is also able to provide variance plots for every time index and different pixel masking ratios and types of masks. Insights on the method will be included in a publication centered on the application use case.
Challenge: In order to better understand threatened populations of bowhead whales, monitoring techniques include regular monitoring via passive acoustic monitoring. The manual detection of bowhead whale vocalizations in the recorded spectrograms is a time-consuming task that requires expert knowledge. The implementation of AI-based algorithms could greatly reduce the manual analysis workload. Challenges in the data include a low signal-to-noise ratio, substantial data volumes and variations between different recording devices.

Approach: We prototyped a CNN-based deep learning model and trained it on sample spectrogram data, giving the scientists a model baseline for further experimentation and satisfying their training needs. We also co-develop further optimizations to improve the accuracy of the model, including hyperparameter tuning, different preprocessing strategies and denoising.

Results: The model was refined from a prototype to be applicable to a more diverse and larger dataset, requiring I/O optimization using zarr and preprocessing better aligned with different audio signal lengths. In addition, several tuning experiments were made concerning spectrogram parameters, model parameters and early stopping criteria. The resulting optimal model is a 7-layer CNN with around 700k trainable parameters that achieved a sensitivity of more than 80% and false positive rate of less than 1.5% over audio signals from all recorders. A publication is in preparation.
Challenge: Surface energy fluxes in the arctic are known to be unrealistic in widely used reanalysis datasets such as ERA5. AWI has access to many high quality measurements from long-term field campaigns that could be used to correct the fluxes. Therefore, the idea is to build a data-driven bias-correction module for ETA5 surface fluxes.

Approach: The consultants implemented a MLP-based model for sensible heat fluxes that shows some first promising results. Based on these insights, we will continue with models for other fluxes, tune all models to improve accuracy, investigate explainability and potentially experiment with probabilistic ML approaches for uncertainty quantification.

Results: An optimized version of the model is able to significantly reduce the total surface budget error, and it was used to create an improved surface variable dataset covering the whole Arctic region with hourly temporal resolution and 0.25 degree spatial resolution. A paper is in preparation; in addition, the dataset has manifold potential downstream uses, and the interaction with the consultants sparked several new AI projects.
Challenge: Including any kind of ML-based components the ICON Earth System Model requires a technical bridging between the ESM (written in FORTRAN) and ML code (written in Python). Moreover, ICON typically runs on CPUs, while ML code runs much faster on GPUs. The tooling and execution environment required for such setups need to be examined in a simple prototype enabling comparison between different approaches, platforms, and a baseline version of the ICON model.

Approach: Within this long-term complex topic of technical integration, this voucher concentrates on prototyping a selection of bridges between ESM and ML code followed by benchmarking. The goal of the voucher is to implement several bridges, and acquire reliable performance measurements for different meaningful execution constellations.

Results: A set of bridges has been implemented and benchmarks have been conducted on a single site with single- and multi-node setups. Benchmark results indicate that the performance benefits of a direct, local execution outweigh advantages of GPU-based execution. The following article summarizes our findings:

Arnold, Sharma, Weigel and Greenberg (2024): “Efficient and stable coupling of the SuperdropNet deep-learning-based cloud microphysics (v.0.1.0) with the ICON climate and weather model (v2.6.5)”. Geoscientific Model Development
Challenge: The recession of the Dead Sea shoreline in Jordania causes the emergence of sinkholes and other geological features over time, which pose a threat to local infrastructure and property. The underlying mechanisms need to be further understood and the ability to predict the formation and evolution of terrestrial structures would help to address the associated geological hazards. Given that high-resolution data acquisition is relatively expensive, we need to build a system that leverages more readily-available heterogeneous data inputs with various scales and resolutions.

Approach: Autoencoder-based architectures, in conjunction with computer vision techniques, are combined to perform both semantic and instance segmentation with acceptable complexity on high-resolution aerial data. Subsequently, we apply transfer knowledge techniques from drone to satellite scenarios using several strategies, and address the intermediary challenges that stem from this procedure.

Results: A data processing pipeline and model able to perform instance segmentation has been developed with acceptable performance for empirical use-cases. The following article provides more details:

Alrabayah, Caus, et al. (2024): “Deep-Learning-Based Automatic Sinkhole Recognition: Application to the Eastern Dead Sea”. Remote Sensing, AI4NH special issue
Challenge: While the previous voucher on GNNS-R data concluded with good accuracy on determining wind speeds from the noisy signals, a further challenge is to also determine rainfall over oceans from the data. In principle, this may be possible because rainfall, just as wind, causes roughness of the ocean surface. However, the effect is much less pronounced, and particularly the combined occurrence of wind and rainfall is difficult to predict, and also side effects from disturbances at nearby locations need to be corrected.

Approach: The consultant team will help to further improve the ML model training and architecture, incorporating additional parameters and applying the model to additional data in order to tackle the challenges in training a network sophisticated enough to determine both wind and rainfall.

Results: The model has been extended and updated to cover additionally available data and strategies have been tested to account for rainfall, deal with gaps in the data and improve the performance on extreme values. We experimented with further extension of the model that work well in such cases and can predict rainfall at acceptable accuracy.
Challenge: Human-induced seismicity due to geothermal energy production poses an environmental risk. Understanding the mechanisms underlying induced seismicity, such as the relationship between fluid pressure and resulting seismic activity for instance, can help reduce risks and make them more transparent and addressable.

Approach: Several data sources were acquired, including lab data from comparatively controlled environments. These can be upscaled to the scope required in practical applications, where incorporating long-term observational data from geothermal fields in addition to the lab data is of key importance. In terms of ML techniques, LSTM models and Transformers may be applied to cover temporal aspects, but also spatial aspects could be of relevance. Importantly, understanding the underlying causes and mechanisms requires a workflow deliberately designed to afford explainability.

Results: Several varieties of ML models and data pre-processing strategies were implemented, trained and evaluated. Subsequent analysis of feature importance gave domain-specific insight into potential driving factors. The results informed a publication that was compiled following the implementation work:

Karimpouli, Caus, Grover et al. (2023): “Explainable machine learning for labquake prediction using catalog-driven features”. Earth and Planetary Science Letters
Challenge: This voucher concerns a request from GFZ German Research Centre for Geosciences, where the goal is to detect marine litter from high-resolution satellite image data.

Approach: The consultant team will provide guidance on the methodological approach, address possible issues with data processing and biases, and discuss different strategies on how to train, improve and tune the network.

Results: The model’s software architecture was improved by migrating it to pytorch lightning and tuning its performance. Several approaches were tested and implemented to improve accuracy and cope with the limited number of labels. Automated hyperparameter tuning was set up and used to further optimize the model. The model now achieves acceptable accuracy within the scope of the underlying project workflow. Further improvements may be reached by experimenting with semi-supervised learning techniques, increasing the number of labels and leveraging additional channel data.
Challenge: Modelling the dispersion of atmospheric aerosols in state-of-the-art models such as ICON-ART causes a significant increase in the computational costs of running such models, for example for operational weather prediction. Using ML models to emulate tracer transport could reduce such costs substantially, yet the effect on model output quality under different scenarios needs to be evaluated.

Approach: A prototype ML model that uses CNNs has been developed and should be further improved as part of this voucher. The current model setup causes spatial artefacts and low accuracy at higher resolutions. The architecture needs to be improved, and further gains can be achieved by tuning its hyperparameters.

Results: The ML model under development could be further improved in terms of accuracy and reduction of spatial artefacts. Future work could progress beyond the scope of the motivating thesis, subject to further discussion.
Challenge: The Karlsruhe Institute of Technology (KIT) requested AI consultants' support to investigate the use of ANNs to emulate the results of an operationally used atmospheric chemistry modelling component (EMAC). The goal is to replace the computationally costly (PDE solving) model with a ANN component that is more efficient while providing results of comparable accuracy.

Approach: The consultants designed and implemented a full workflow including data processing and running a suitable ANN model at large scale (multiple GPU nodes). Time-intensive work involved uncovering and mitigating biases and quality concerns in the training data, experimenting with different ANN architectures and perform hyperparameter tuning on them.

Results: Initial results indicate that a ML model capable of performing required regression on a subset of all chemical variables is feasible, but significantly more training data is now available and required to reach quality of regression results comparable with EMAC, also to cover all relevant target variables under extreme conditions. Improving the scalability of the approach will need subsequent follow-up work, as will the long-term integration into an operational model run setup.
Challenge: GFZ requested support to determine via regression wind speed over oceans from GNNS reflectometry data (CYGNSS mission data from 2018), starting from an early ANN prototype and improving it significantly in terms of accuracy and scalability beyond single GPUs.

Approach: We helped to find and address biases and noise in the data and developed a suitable data processing pipeline that can potentially also be re-executed for future data. The work on the data went hand-in-hand iteratively with ML model definition and implementation, improving the regression via iterative experimentation (focusing on CNNs), hyperparameter tuning and extension to better use of potentially multiple GPU nodes.

Results: The voucher concluded with results that show that an ML-based approach can beat the established (non-ML) methods in terms of prediction accuracy, with remaining uncertainty in severe weather conditions.
Challenge: In this voucher for Hereon, we investigate the potential of ML methods to detect and possibly predict rogue waves from sea surface height time series measurements.

Approach: The provided time series data was explored and statistically analyzed to get a hold of the postulated signal and underlying noise. Several models were tested, including data processing with FFTs, different chunking strategies and reformulating prediction targets to steer the underlying ML model to a sufficiently precise question. In terms of network architectures, LSTMs have been mostly employed, but also some early experiments with transformer models were conducted.

Results: Detection of rogue waves with an ML model has been shown to work in principle. For prediction of rogue waves, several experiments were conducted and different methods and data processing strategies tested. Additional data and further experimentation may be required to come to a model with practically acceptable accuracy and reliability under changing data input conditions.
Challenge: UFZ requested support to tackle grand challenges in terms of quantifying data uncertainty from observational data products via running ML models to final analysis, distinguishing different forms and sources of uncertainty.

Approach: While originally planned as a networking exercise, with the goal to gain traction via at least one workshop, it became clear early on that a workshop could only be a second step following attention-raising through a dedicated, potentially provocative, high-level discussion paper publication. We drafted a suitable manuscript with the goal to raise awareness for the challenges to address potentially beyond the geoscience and ML communities.

Results: Needs for further networking and workshops were discussed but put on hold to be addressed following feedback from getting the publication out. The publication has been submitted at the end of the voucher.
Challenge: In order to make efficient use of ML models in practical Earth System model runs, challenges in loading and streaming big climate data sets to ML applications need to be addressed. This voucher, requested by Hereon, investigated the different stages where bottlenecks may occur in an operational model pipeline, from loading data from disk to CPU memory, transfer to GPU memory and final computation on GPUs in a cluster/HPC setup.

Approach: After investigating the balance between these optimization goals, the consultant team focused on optimizing the I/O throughput from disk to GPU memory based on the community-defined, WeatherBench benchmark that is representative for a wide range of practical cases. Work on the voucher included adapting the WeatherBench code to the HPC system, performing necessary data transformations, defining the full software stack, implementing it on an HPC system, performing a wide range of benchmarks and defining subsequent optimization strategies, all in consultation with the users.

Results: The voucher concluded with a full pipeline described and implemented that is required to run the WeatherBench scenarios at scale, scientifically valid benchmark results, and an outlook to write a publication based on the results.
Challenge: Understanding the driving factors causing large-scale flood and drought events, both from a historical perspective but also under present climate change, is of high relevance to society and adaptation strategies. Users sought advice on the general feasibility and potential implementation with ML in view of available funding opportunities.

Approach: We assessed available data sources and evaluated a range of ML techniques, including boosted trees and VAEs, according to their suitability, risk vs. gain and potential implementation costs in view of the envisioned project work.

Results: A suitable strategy was proposed and could be included in proposal applications.
Challenge: Future changes in groundwater availability can have a dramatic impact on local water supply for agriculture and human well-being, particularly in dry areas affected by climatic changes. Being able to better predict future groundwater levels at seasonal scale under possibly changing climatic conditions can provide insights valuable for regional decision-making and improve our understanding of the water cycle dynamics.

Approach: Time series data from a global groundwater database and additional context data such as catchment topography and local precipitation will be used to train Artificial Neural Networks. In particular, LSTM-based ANNs will be used as they can model the time dependencies well. The goal is to predict future groundwater level changes and becoming able to distinguish between climatically induced changes and anthropogenic influences. The consultants provide support to the user concerning the technical setup, performance improvements, model architecture and methodological questions.
Challenge: Acquiring precise measurements of sea surface heights (SSH) at global scale, high spatial resolution and close to real time is a grand challenge for ocean monitoring. Acquiring such data is the goal of several past and upcoming Earth observation missions. The estimation of precise SSH data from satellite and synthetic data with ML methods has been requested by this voucher for GEOMAR.

Approach: Given available data, it should be evaluated which ML methods may be best suited, particularly for some first promising experiments, and how the overall workflow should look like.

Results: A preliminary data processing and model workflow and the general feasibility of such a solution have been discussed and the availability of synthetic data with high spatial and temporal coverage is a promising factor for success. Suitable candidates include GANs and VAEs, but also transfer learning with the challenge that ground truth data will continue to be spatially sparse. Further simulation data needs to be produced and pre-processed by the requestors before experimentation with model architectures at scale can commence.
Challenge: In this voucher, the consultant team was asked by GEOMAR to help with defining a suitable data processing pipeline for a marine science use case, performing a spatial regression of total organic carbon content of sea soil from a comparatively small number of ground-truth measurement points. An initial ML prototype already existed to perform the regression, but GEOMAR needed further insight into what exactly the model was doing in terms of feature importance and data quality aspects.

Approach: The consultants provided counsel on the methodological approach and performed feature importance rankings using Random Forests.