Creating synthetic transcriptomic profiles using generative adversarial networks (GANs).
What is PRO-GENE-GEN? Read this week’s Helmholtz AI project showcase a collaboration between the German Research Center for Neurodegenerative Diseases (DZNE) and the CISPA Helmholtz Center for Information Security (CISPA).
Figure: Structure of the PRO-GENE-GEN project. Patient or study data is used to train a generator, which can then provide privacy-preserving synthetic cohorts for sharing. Figure created with BioRender.com.
Could you introduce yourself, giving your affiliation, area of work, and of course, the project title?
We are Matthias Becker, Career Development Fellow for Modular HPC and AI at the DZNE and Mario Fritz, faculty at CISPA. We co-lead the Helmholtz AI project PRO-GENE-GEN: Virtual cohort data for personalised medicine.
In simple words, what specifically is your project about? And, how and why do you think it is a high risk, high gain endeavour?
In Pro-GENE-GEN, we want to create transcriptomic profiles for synthetic cohorts. Patient data, and in particular transcriptomics, is very sensitive and the rules for sharing are strict. Science is based on collaboration and this requires data sharing. In this project, we investigate how differentially private generative adversarial networks (GANs) can be used to generate synthetic transcriptomic profiles that mimic the study data and do not expose patient specific information. Having such a method would provide high gains for the community and improve collaboration. But the transcriptomic data space is vast and study sizes are small, so training a GAN - in particular in a differentially private manner - is highly challenging.
How important has the Helmholtz AI funding and platform been to carry out this project?
Helmholtz AI has made this project possible, as other funding bodies often are very risk-averse. The approach to always link at least two Helmholtz centers helps to establish interdisciplinary projects.