Helmholtz AI project call showcase: Machine-learning based synthetic data generation for rapid physics modeling (SynRap)

Machine learning is based on learning from the information available. The artificial intelligence gets the data, finds patterns, and is able to predict how new scenarios will develop. But what if we didn’t have enough data from the start?


How can AI help create machine learning models when data is scarce? Read in this week’s Helmholtz AI project showcase how researchers at the Deutsches Elektronen-Synchrotron (DESY) and the Helmholtz-Zentrum Dresden-Rossendorf (HZDR) are working jointly on this challenge.


Could you introduce yourself, giving your affiliation, area of work, and of course, the project title?

I am Dr. Isabell Melzer-Pellmann, a particle physicist and the group leader of the CMS group at the Deutsches Elektronen-Synchrotron DESY. Along with my co-investigator Dr. Dirk Krücker, our group looks for physical phenomena beyond the Standard Model of particle physics. We were joined by Benno Käch last year, who currently works on synthetic data generation as a PhD student.

In the SynRap project we are collaborating with Attila Cangi, the acting department lead of Matter under Extreme Conditions at the newly founded Center for Advanced Systems Understanding (CASUS), Helmholtz-Zentrum Dresden-Rossendorf. His team develops simulation methods for describing matter under extreme conditions using machine-learning techniques. These are used to advance our understanding of astrophysical objects, in the discovery of novel materials, and to enable new technologies like nuclear fusion.

In both our research fields we work with either high-energy physics and/or materials science, which pose similar methodological challenges in machine learning. That is why we teamed up around the project “Machine-learning based synthetic data generation for rapid physics modeling” (SynRap), where we aim to overcome these limitations. 


In simple words, what specifically is your project about? And, how and why do you think it is a high-risk, high-gain endeavor?

In order to create a useful machine learning model, we first need to train it with the available data. Depending on how good this data set is, our AI will be able to make predictions more or less accurate to the real-life scenarios. However, high-fidelity training data to train neural networks is often sparse and very costly to generate. Especially in our case, where we work with high-energy physics. Experimental data needs to be complemented by synthetic datasets, and obtaining new synthetic data takes highly complex computational simulations. 

The Synrap team investigates how accurate synthetic data can be generated using a specific type of machine learning called surrogate models. This kind of models do not need as much data to learn – they identify key data points inside the given samples and work from them. In a second step, our findings will be used to create the needed training synthetic datasets in a cheaper and faster way, and thus accelerate the training of neural networks. 

Methodologically, this project is at the forefront of active research. If successful, it will significantly boost computational efficiency, enabling large-scale simulations currently unreachable with standard methods.

Synthetic data is essential for complementing experimental data in high-energy physics. Generating accurate synthetic data sets is however computationally heavy. The use of neural networks could considerably accelerate the production of synthetic data. This will enable scientists to test more hypotheses on the standard model of elementary particle physics and go beyond this model.

A singular feature of the Synrap collaboration is the level of abstraction. The team will develop a unified framework that will be used to tackle common challenges in two seemingly different research areas – high-energy physics (HEP) and high energy-density (HED) phenomena. While scales and physics in both fields are very different, methodological challenges are similar when viewed from a complex systems research perspective.


How important has the Helmholtz AI funding and platform been to carry out this project?

Helmholtz AI has been essential for this project. It provides us with the financial means to support one PhD student and one postdoctoral researcher to work on this project. But more importantly, the Helmholtz AI platform facilitates a very fruitful collaboration between the research groups at DESY and HZDR. Success in this high-risk, high-gain project relies on bringing together our complementary expertise from different research areas.


Any other comments you wish to add? 

We are delighted to share some concrete examples from our day-to-day research activities. The figure below illustrates synthetic data generation for a simplified HED application. Here, we use a specific type of neural network – a generative adversarial network – to produce high-resolution synthetic data based on low-resolution input.

The probability distribution of the electrons in space, i.e., the electronic structure (left) is the low-resolution input to a generative model (center) which produces a high-resolution output of the electronic structure (right).  In this example, we consider atomic configuration snapshots of Aluminum at ambient mass density and room temperature.