Scientists Bastian Rieck and Julius von Rohrscheidt from Helmholtz Munich have developed a novel approach to address challenges arising from violations of the “manifold hypothesis” - an essential assumption in data science.
Recent research presented at the International Conference on Machine Learning (ICML), a leading conference in machine learning, sheds light on the importance of topological and geometrical methods in the field of machine learning.
Scientists Bastian Rieck and Julius von Rohrscheidt from Helmholz Munich have developed a novel approach to address challenges arising from violations of the “manifold hypothesis” - an essential assumption in data science. The manifold hypothesis assumes that datasets can be understood as samples from spaces called manifolds, which resemble familiar Euclidean spaces locally. However, real-world datasets often deviate from this assumption, resulting in anomalies and irregularities. This poses a significant challenge for many machine learning techniques commonly used to analyse high-dimensional data. The authors’ research introduces a new method for testing the manifold hypothesis by comparing local neighborhoods of data points to Euclidean model spaces that adhere to the manifold hypothesis. Leveraging tools from computational topology - a field that explores the shape and structure of data - the researchers have achieved remarkable results in detecting points in the data that violate this “manifoldness” assumption.
The study demonstrates the method’s success in finding irregularities in both synthetic and real-world datasets. Notably, the proposed score captures the inherent geometric complexity of greyscale images and significantly correlates with a machine learning classifier’s accuracy in assigning labels. These findings show the critical role of topological and geometrical considerations in machine learning. By incorporating tools from computational topology, scientists can better understand and address the challenges posed by complex datasets that deviate from traditional assumptions. The proposed research highlights the interdisciplinary nature of challenges in data science and machine learning, empowering researchers to make more informed decisions when analysing intricate datasets.
For more information about this work, please see https://arxiv.org/abs/2210.00069
Code can be found on the following page: https://github.com/aidos-lab/TARDIS
Disclaimer: Header image was created with the assistance of DALL·E 2.