PySDDR combines the interpretability of a statistical model with the power of deep neural networks in an easy-to-use python package.
Typically, statistical models and machine learning models are on opposite ends of the data analysis spectrum:
Statistical models, at one end of the spectrum, are readily interpretable, but have the disadvantage of lacking the expressive power to deal with complex data such as images. Deep learning techniques, at the other end of the spectrum, allow modeling of almost any complex relationships, but the resulting models are hard-to-interpret black boxes. A new technique that combines the strength of both worlds is semi structured deep distributional regression (SDDR).
SDDR enhances the expressive power of standard statistical modelling with deep learning while maintaining interpretability. It is based on generalized additive models (GAMs), which are a widely used statistical tool to model how an observed variable depends on a set of features. ‘SDDR extends GAMs with the modeling power of deep neural networks’, says Lisa Barros de Andrade e Sousa, Helmholtz AI consultant at Helmholtz Munich.
In their effort to democratize AI, the Helmholtz AI consultant team at Helmholtz Munich has joined efforts with Dr. David Rügamer (LMU), the researcher that invented the SDDR technique, and developed the python package PySDDR to make this innovative new tool available to the Python community.
A good example of a PySDDR application would be predicting the price of a vacation rental apartment’, says Christina Bukas, also a member of the Helmholtz AI consultant team. ‘With PySDDR, the probability distribution of the price of the apartment can be modeled not only based on tabular features like the size of the apartment, its location or the quality category of the bathroom, but is also estimated from photos of the apartment.’
PySDDR, along with the R implementation of SDDR developed at LMU, allows any scientist, statistician or data scientist to use this exciting new technique and tap into the power of the union from statistics and machine learning.
- GitHub - https://github.com/HelmholtzAI-Consultants-Munich/PySDDR
- Paper preprint: https://arxiv.org/abs/2104.02705