Helmholtz AI consultants @ Helmholtz Munich

Helmholtz AI consultants @ Helmholtz Munich

© Jan E. Siebert

Marie Piraud

Team leader

 

Health-focused AI consultants

The Helmholtz AI central unit is also the local unit for Health, and includes a team of health-focused consultants. They are key actors in achieving the Helmholtz AI goal of empowering scientists to use AI in their research. For that, they advise and support research teams in using machine learning and deep learning. The consultants master a broad range of methods and tools, and offer help at all stages of the data analysis pipeline, from project conceptualisation to actual implementation. They provide reusable code and technical reports, and strive to enable their scientific collaborators to leverage the methods themselves, by proposing pair programming and code review sessions for example. They also play a key role in disseminating knowledge, by contributing open-source software to the community and proposing trainings adapted to the needs of the Health research community.

Questions or ideas? Feel free to reach out to us! consultant-helmholtz.ai@helmholtz-muenchen.de

GITHUB Helmholtz AI consultants @ Helmholtz Munich

 

 

 

Team members

Marie Piraud

Marie Piraud

Helmholtz AI consultant team leader @ Helmholtz Munich

Marie Piraud

Marie Piraud

Helmholtz AI consultant @ Helmholtz Munich team leader

Focus areas:

  • Medical computer vision, deep learning
  • Biomedical data analytics, survival analysis
  • Dynamical and statistical modelling, mixed-effect models
  • Multi-modal data integration

Lisa Barros de Andrade e Sousa

Lisa Barros de Andrade e Sousa

Helmholtz AI consultant @ Helmholtz Munich

Lisa Barros de Andrade e Sousa

Lisa Barros de Andrade e Sousa

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Analysis and Integration of Omics Data
  • Machine Learning
  • Explainable AI

Highlighted projects:

  • Understanding gene silencing dynamics using explainable AI
  • Prediction of miRNA expression from epigenetic factors
  • Agent-based modelling of population dynamics

 

Christina Bukas

Christina Bukas

Helmholtz AI consultant @ Helmholtz Munich

Christina Bukas

Christina Bukas

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Medical Image Processing
  • c-GANs
  • Image inpainting

Highlighted projects:

  • Acquisition Quality Assessment of Echocardiograms
  • Estimation of vertebral bodies prior to fracture damage using generative models
  • Rest tremor detection in Parkinson’s Disease with the help of a smartphone

Donatella Cea

Donatella Cea

Helmholtz AI consultant @ Helmholtz Munich

Donatella Cea

Donatella Cea

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Computational Modelling, Physics and Statistics
  • Machine Learning
  • Teaching and Science Communication 

Elisabeth Georgii

Elisabeth Georgii

Helmholtz AI consultant @ Helmholtz Munich

Elisabeth Georgii

Elisabeth Georgii

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Multi-omics data fusion
  • Supervised and unsupervised learning
  • Algorithmic data mining

Highlighted projects:

  • Identifying molecular networks of plant stress responses
  • Modeling drug sensitivity of cancer cell lines

Florian Kofler

Florian Kofler

Helmholtz AI consultant @ Helmholtz Munich

Florian Kofler

Florian Kofler

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • biomedical image analysis
  • machine learning
  • perception
  • vision
  • AI

Google Scholar

Isra Mekki

Isra Mekki

Machine Learning Engineer for AI applications @ Helmholtz Munich

Isra Mekki

Isra Mekki

Machine Learning Engineer for AI applications @ Helmholtz Munich

Focus areas:

  • MLOps
  • Deep Learning
  • Software Engineering

Erinc Merdivan

Erinc Merdivan

Helmholtz AI consultant @ Helmholtz Munich

Erinc Merdivan

Erinc Merdivan

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Natural Language Processing
  • Deep Learning
  • Deep Reinforcement Learning

Highlighted projects:

  • Enzyme function prediction using binding sites represented as point clouds
  • CRISPRi guide efficiency prediction

Helena Pelin

Helena Pelin

Helmholtz AI consultant @ Helmholtz Munich

Helena Pelin

Helena Pelin

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Machine Learning and Statistics
  • Analysis of Omics and Healthcare Data
  • Clustering Analysis

Harshavardhan Subramanian

Harshavardhan Subramanian

Helmholtz AI consultant @ Helmholtz Munich

Harshavardhan Subramanian

Harshavardhan Subramanian

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Medical Image Processing
  • Natural Language Processing
  • Supervised and Unsupervised Learning

Mahyar Valizadeh

Mahyar Valizadeh

Helmholtz AI consultant @ Helmholtz Munich

Mahyar Valizadeh

Mahyar Valizadeh

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Intrepretable/Explainable AI
  • Stochastical Inference
  • High Performance Computing (HPC)

Github: https://github.com/mahvili

Gerome Vivar

Gerome Vivar

Helmholtz AI consultant @ Helmholtz Munich

Gerome Vivar

Gerome Vivar

Helmholtz AI consultant @ Helmholtz Munich

Focus areas:

  • Multi-modal Machine Learning
  • Geometric Deep Learning
  • Clinical Decision Support Systems

Selected ongoing voucher projects

Subgroup identification in spinocerebellar ataxias 

  • Challenge: Spinocerebellar ataxias (SCAs) are rare, autosomal dominantly inherited neurological diseases with onset in adult age. The most common SCAs, SCA1, 2, 3 and 6, together account for more than half of all affected families worldwide. Clinical hallmarks are progressive loss of balance and coordination, accompanied by slurred speech. Patients affected by SCA suffer substantial restrictions of mobility and communicative skills. Predicting the disease progression from genetic features, demographic information and the current status of neurological symptoms paves the way toward potential stratification markers and is important for anticipating optimal windows regarding the start of preventive treatments.
  • Approach: Using a multi-cohort data set with clinical time courses of different established neurological scales, comprising a total of 39 single items, we trained predictive models by regularized Cox regression and survival forests. For each of the most common SCAs, we extracted relevant features and characterized its progression with respect to the multitude of neurological symptoms. The loss of the ability of free walking is a transition of high clinical impact and was analyzed in detail to support future monitoring and decision making.

 

Validation strategies of dynamical whole-brain models

  • Challenge: Whole-brain models of brain activity offer a unique opportunity to assess new therapies and diagnostics in a virtual setting. They need to be adapted close to the empirical data so that they realistically represent brain dynamics of patients to be used for clinical purposes. However, fitting and validation of time series data (based on fMRI or EEG activity) is challenging, as the time series incorporate stochastic processes with a high amount of noise (caused by movement, blood flow, measurement artefacts…).
  • Approach: Following the Bayesian stochastic variational inference model of Hashemi et al. [1] for synthetic data, we developed a model simulating the empirical data which improved the current model validation and fitting strategy of dynamic whole-brain models based on five subjects provided by our collaborators at DZNE. The future work will involve optimization of the code for speed-up, getting higher accuracy as well as the validation of the results in a larger cohort with healthy controls and patients.

[1] Hashemi, M., Vattikonda, A., Sip, V., Guye, M., Bartolomei, F., Woodman, M. M., and Jirsa, V. K. (2020). The Bayesian virtual epileptic patient: A probabilistic framework designed to infer the spatial map of epileptogenicity in a personalized large-scale brain model of epilepsy spread. NeuroImage, 217:116839.

 

Systematic evaluation of cell-type deconvolution pipelines 

  • Challenge: DNA methylation analysis by sequencing is becoming increasingly popular, yielding methylomes at single-base pair and single-molecule resolution. It has tremendous potential for cell-type heterogeneity analysis using intrinsic read-level information. Although diverse deconvolution methods were developed to infer cell-type composition based on bulk sequencing-based methylomes, systematic evaluation has not been performed yet.
  • Approach: Together with our collaboration partners from the German Cancer Research Center, we thoroughly benchmark six previously published methods: Bayesian epiallele detection, DXM, PRISM, csmFinder+coMethy, ClubCpG and MethylPurify, together with two array-based methods, MeDeCom and Houseman, as a comparison group. We found that array-based methods—both reference-based and reference-free—generally outperformed sequencing-based methods, despite the absence of read-level information. This implies that the current sequencing-based methods still struggle with correctly identifying cell-type-specific signals and eliminating confounding methylation patterns, which needs to be handled in future studies.
  • This project was published in: Jeong, Y., Barros de Andrade e Sousa, L. , Thalmeier, D., Toth, R., Ganslmeier, M., Breuer, K.,... & Lutsik, P. (2022). Systematic evaluation of cell-type deconvolution pipelines for sequencing-based bulk DNA methylomes. Briefings in Bioinformatics. (Link: https://doi.org/10.1093/bib/bbac248)

 

Automatic scoring of vitiligo in dermatological 3D full body scans using swarm learning 

  • Challenge: Vitiligo is a skin disease that can be diagnosed and monitored through patient history visual inspection. However, visual inspection is limited when it comes to monitoring treatment response and quantifying disease progression. Clinical images are routinely taken, but can be sometimes difficult to compare due to different technical circumstances. A relatively new tool in Dermatology is the 3D full body scanner, which can almost image the whole skin surface in a highly standardized way and with high quality. Dealing with these full body scans however presents other challenges, including data protection since anonymization is not feasible.
  • Approach: We are building an automated approach for real-world hospital settings when data can not be shared between hospitals due to patient confidentiality and privacy. Using full-body scans from the Department of Dermatology and Allergy at the Klinikum Rechts der Isar in Munich, we will first develop an automated method that can detect and segment vitiligo using Deep Learning. We will then use the Swarm Learning framework developed by our collaborators at DZNE [1] for decentralized training of the final model to enable real-world collaboration among hospitals. As a first use case, we will then train the algorithm with full-body scans from the Dermatology Department of the university hospital Erlangen.

[1] github.com/HewlettPackard/swarm-learning

 

Quantum chemically refined database of experimental protein-ligand complexes 

  • Challenge: Database of protein-ligands complex structures play an important role in machine learning research supporting drug discovery. However, available ligand structures are not quantum chemically refined, resulting in ligands with inaccurate 3D structures, protonation and charges. The new database includes ligands with accurate 3D structure hence increasing the data quality for 3D Deep Learning models in drug discovery.
  • Approach: We prepare a public repository consisting of database and data loaders for quantum chemically refined protein ligands and their physicochemical properties generated by our partners at Helmholtz Munich. We will now benchmark different AI models (3DCNN, Equivariant Neural Networks, 2D and 3D Graph NN) on it, to promote the enhanced database among the AI community.

Selected completed voucher projects

Automatic feature estimation from transthoracic echocardiography 

  • Challenge: Echocardiography is a rapid and cost-effective imaging technique that assesses cardiac function and structure. Despite its popularity in cardiovascular medicine and clinical research, image-derived phenotypic measurements are manually performed, requiring expert knowledge and training.
  • Approach: We developed Echo2Pheno, an automatic statistical learning workflow for analyzing and interpreting high-throughput transthoracic murine echocardiographic images in the presence of genetic knockouts. The pipeline, consisting of two neural networks for image analysis and a hypothesis-testing module for assessing phenotypic differences between populations, has accurately confirmed known cardiovascular genotype–phenotype relationships and been used to discover novel genes, which cause altered cardiovascular phenotypes. Echo2Pheno provides an important step toward automatic end-to-end learning for linking echocardiographic readouts to cardiovascular phenotypes of interest.

Enzyme function prediction

  • Challenge: Enzymes are a subset of proteins which are important biochemical catalysts regulating many biological functions, such as aiding in chemical transport or cleaving molecules. The enzyme function is determined by its chemical function, which is described by a local chemical environment: the enzyme binding site. In rational drug design this binding site information is crucial. With it new structures can be predicted to activate or inhibit the function of bio-molecules. Although there are many methods to classify enzymes using sequence information, there are few methods that take into account 3D structure and atom positions. Another challenge is that enzyme functions are organized as a hierarchical tree structure, which require a method to classify enzyme function well on different levels.
  • Approach: We implemented classical 3D Convolutional Neural Networks (NN) with rotation augmentation. As a more advanced approach, we also implemented 3D Graph NNs which are rotation invariant and use distance and/or angle information between atoms of the enzyme pocket. In order to investigate model predictions, we implemented explainability methods for graphs.

Identification of infected cells from unlabelled microscopy images 

  • Challenge: Augmented microscopy techniques can employ the power of deep neural networks to predict fluorescent labels from transmitted light images. These networks need to be trained on large datasets of fluorescent images, but once trained are able to predict various cellular structures, such as the cell nucleus or membrane, from images of unstained cells. So far, no method exists which can reconstruct fluorescent labels of infection markers and scientists are not even certain if this is possible, since infected cells in brightfield images are not distinguishable from healthy ones.
  • Approach: We created a proof of concept that neural networks are able to reconstruct infection labels from brightfield microscopy images. Our work is based on state-of-the-art research that demonstrates the capability of CNNs to reconstruct other fluorescent markers, such as the DAPI marker that highlights the cell nucleus. Our method implements a U-Net and requires a single 2D brightfield images as input. We attempt to learn infection channels by having the network focus on image regions that are heavily infected. By applying different machine learning methods and visualization techniques we are able to see that our deep learning model can indeed identify cell infection.

Wavelet-based Event Separation

  • Challenge: Researchers at Helmholtz Munich are interested to find new ways to analyse high resolution spatio-temporal data of of brain activity. Dependent on experimental conditions brain-activity can range from seemingly chaotic dynamics to slow traveling wave phenomena. The goal of this project was to disentangle the complex dynamics and isolate phenomena in the brain in order to better understand the dynamics in the different experimental conditions. One of the challenges hereby is that the timescales of the phenomena in the chaotic regime are unknown.
  • Approach: We used a continuous wavelet analysis coupled to a cluster analysis, to divide the complex multi scale time series into several modes and separate the relevant phenomena in the data.

Automatic Cell Counting in cell migration experiments

  • Challenge: Cell migration is central to many physiological and pathological processes such as embryonic development, wound repair, and tumor metastasis. Boyden Chamber assay is the most widely accepted cell migration technique for the characterization of cell motility. Cell motility is quantified by counting the cell numbers in the microscopic images. Such images normally contain many cells and therefore counting manually is quite time consuming, laborious and error prone.
  • Approach: An automatic cell counter algorithm is provided to count crystal violet cell numbers in 2D microscopic images. In addition, a graphical user interface is also implemented for further manual correction of the automatic results. Our solution permits to speed up the analysis in cell migration experiments by a factor of 10.

CRISPRi guide efficiency prediction in bacteria

  • Challenge: CRISPR interference (CRISPRi) has become a prevalent technique in bacteria for studying the function of individual genes, regulating pathways for metabolic engineering, and performing genome-wide genetic screens. However, design tools for guide selection remain to be developed for CRISPRi despite their availability and common use for other CRISPR technologies. The goal of this project was the development of a model that accurately predicts guide depletion in publicly available CRISPRi essentiality screens in Escherichia coli, using a variety of sequence and thermodynamic features.
  • Approach: The efficiency of guide RNAs can be measured with genome-wide essentiallity screens. However, the efficiency calculated from these screens can only serve as a proxy for guide efficiency because it contains confounding gene effects. To correct the guide efficiency for those gene effects, we used a median subtraction approach. In a second step, we used the corrected guide efficiency to develop a model that can predict the guide efficiency of unseen guides from different sequence and thermodynamic features. Here, we compared a 1D convolutional neural network with a recurrent neural network to investigate if the sequential information in the guide RNA sequence is more informative than the positional information, i.e. which nucleotide can be found at a certain position in the guide sequence, and conclude that the positional information is more informative than the sequential information. Furthermore, the trained 1D CNN model accurately predicts efficiency of 750 guides specific to nine purine genes essential in minimal media in Escherichia coli.

Softwares and resources

QUICKSETUP-AI

Through Quicksetup-ai, we propose a flexible template as a quick setup for deep learning projects in research. The objective is to let researchers focus on their work, while enforcing software engineering best practices, and reproducibility standards. The template combines established and widely used tools and libraries to provide a clean, simple and reusable baseline with a wide range of features. These include: experiment tracking, automatic documentation generation, configuration management, testing, data version control and hyper parameter tuning.

https://github.com/HelmholtzAI-Consultants-Munich/Quicksetup-ai

 

FOREST-GUIDED CLUSTERING

Standard explainability methods for Random Forest (RF) models, like permutation feature importance, are commonly used to pinpoint the individual contribution of features to the model performance but often miss the role of correlated features or feature interactions in the model’s decision-making process. The Forest-Guided Clustering algorithm computes feature importance based on subgroups of instances that follow similar decision paths within the RF model, thus focusing on pattern-driven rather than performance-driven importance. By doing so, our method avoids the misleading interpretation of correlated features, allows the detection of feature interactions and gives a sense for the generalizability of identified patterns.

https://github.com/HelmholtzAI-Consultants-Munich/fg-clustering

 

OLIGO-DESIGNER TOOLBOX

Oligonucleotides (abbrev. oligos) are short, synthetic strands of DNA or RNA that have many application areas, ranging from research to disease diagnosis or therapeutics, and need to be designed individually based on the intended application and experimental design. We developed the Oligo Designer Toolsuite, which is a collection of modules that provide all basic functionalities for custom oligo design pipelines within a flexible Python framework. All modules have a standardized I/O format and can be combined individually depending on the required processing steps, like the generation of custom-length oligo sequences, the filtering of oligo sequences based on thermodynamic properties as well as the selection of an optimal set of oligos.
The implemented oligo design pipeline for padlock probes was published alongside the following paper: Kuemmerle, L., Luecken, M., Firsova, A., Barros de Andrade e Sousa, L., Straßer, L., Heumos, L., ... & Theis, F. J. (2022). Probe set selection for targeted spatial transcriptomics. bioRxiv (Link: https://www.biorxiv.org/content/10.1101/2022.08.16.504115v1)

https://github.com/HelmholtzAI-Consultants-Munich/oligo-designer-toolsuite

 

Automatic Cell Counter

This automatic cell counter algorithm permits to count crystal violet cells in 2D microscopic images of Boyden Chamber assay. This simple pipeline, based on classical computer vision algorithms, permits to distinguish cells from the chamber pores, who have a similar color spectrum. An easy-to-use graphical user interface is proposed if further manual correction of automatic results is needed. This software is speeding up the analysis of cell migration experiments by a factor 10.

https://github.com/HelmholtzAI-Consultants-Munich/Automatic-Cell-Counter

 

PySDDR

PySDDR combines the interpretability of a statistical model with the prediction power of deep neural networks in an easy-to-use python package. It is the python implementation of the Semi-Structured Deep Distributional Regression (SDDR) framework  which enhances Generalized Additive Models (GAMs) with neural networks. This extends the use of GAMs to model high-dimensional nonlinear patterns in the data and, furthermore, to be applied to multimodal data (e.g. a combination of tabular and image data). The framework is written in PyTorch and accepts any number of neural networks, of any type (FC, CNN, LSTM, ...).

https://github.com/HelmholtzAI-Consultants-Munich/PySDDR