Topological data analysis deep learning

ow comes the question of how to do it. Some do enhance their inference by augmenting the size of their dataset, others the quality, others use brand new models, and others invent their own techniques. No need to quote it, but the rise of deep learning is probably the best example. Today, I will aim for the last option: innovate in the description of your data. Let me introduce you to one: Topological Data Analysis. Also abbreviated TDA, it is a recent field that emerged from various research in applied topology and computational geometry. It aims at providing well-founded mathematical, statistical and algorithmic methods to exploit the topological and underlying geometric structures in data. Generally visualized on three-dimensional data, TDA can also be useful in other cases, such as time-series. Interested? :D

As I want this article to be practical, I will refer you to an article I wrote previously, exposing a few theoretical concepts. Feel free to spend a few minutes to immerse yourself around the multiple topics TDA has to offer to you. This article is an excerpt from the work I conducted when I worked in the AI laboratories of Fujitsu in Tokyo, in partnership with the Datashape team from the INRIA [French Research Institute]. Tears are running along my cheeks as I, unfortunately, cannot share the entire work, but you should have everything you need to get it done with that Github and that paper.

What are arrhythmias?

ere is no surprise, but you should know that heart attacks and strokes are among the five first causes of death in the US. As it concerns everyone out there, it is no wonder that companies like Apple are targetting the sector as we talk by developing their own smart monitors. It turns out that your heart is probably the most awesome muscle in your whole body: it functions 24/24 7/7 without interruption, and do it in a very rhythmic way. However, it sometimes fails at keeping the pace, would it be because of alcohol, blitz love, intense exercise or horror movies. Some of those failures may end up lethal. Arrhythmias are one type of failure, being an umbrella term for a group of conditions describing irregular heartbeats, in terms of shape or frequency. Detecting those events and monitoring their frequency may be of huge help to supervise your health and make sure you get access to the right health interventions when needed. That, however, requires smart monitoring.

Get your hands dirty!

achine Learning! That sounds like the way to smart monitoring! But it requires more than having a fancy model. Here is a good thing, people have been working hard to facilitate research by providing a family of open-source data sets. Those are available on the Physionet platform and named after the conditions they describe: MIT-BIH Normal Sinus Rhythm Database, MIT-BIH Arrhythmia Database, MIT-BIH Supraventricular Arrhythmia Database, MIT-BIH Malignant Ventricular Arrhythmia Database, and MIT-BIH Long Term Database. Those databases are made of single-channel ECGs, each sampled at 360 Hz. Two or more cardiologists independently annotated each record, whose disagreements were resolved to obtain the reference annotations for each beat. We may already be grateful!

The annotations also have the advantage of relieving us from the heartbeat detection problematic, which isn’t much of a barrier: baseline drift and wavelet transform or 1D-CNN both do work properly as a solution.

How to describe heartbeats?

CGs being one-dimensional time series in our case, how to describe both their shape and time relationships? That is a very general problematic, transposable to lots of domains. Here is where topology will help!

Credits

Let’s begin with the temporal information we want from those heartbeats. At the scale of a single one, the retrieved intervals are the ones depicted on the left [PQRST events]. That information is highly linked to the shape already. At the scale of the ECG itself, we also need to retrieve the RR-intervals, which are the delay in between consecutive R peaks and thus quantify the general rhythm [and its abnormality].

From there, you may already think about building your models using the features exposed above [add some FFT / Wavelets / Chaos Theory / … in the equation]. The results obtained from such features [combined with SVM, Boosted Trees or NNs] are good but not satisfactory enough. Time information is not the problem: it is all about the shape of those heartbeats. They indicate a very complex mechanism in the heart itself, and individual differences do imply a huge variability from the very beginning. We thus need a model able to capture the patterns without overfitting to an average pattern resulting from the single pool of individuals we have. That is certainly where the beauty of Topological Data Analysis reaches its peak, at least regarding time-series.

Among the main challenges faced for arrhythmia classification generalization, we find individual differences, and specifically bradycardia and tachycardia. TDA, and more precisely persistent homology theory, powerfully characterizes the shape of the ECG signals in a compact way, avoiding complex geometric feature engineering. Thanks to fundamental stability properties of persistent homology, the TDA features appear to be very robust to the deformations of the patterns of interest in the ECG signal, especially expansion and contraction in the time axis direction. This makes them particularly useful to overcome individual differences and potential issues raised by bradycardia and tachycardia.

Process Visualization: From ECG to Barcode Diagram to Betti Curve

Here comes the theory of persistent homology [presented in this article]. It allows us to uniquely represent the persistent homology of a filtered simplicial complex [in our case the ECG signal transformed into a graph] with a persistence barcode. A barcode diagram represents each persistence generator with a horizontal line beginning at the first filtration level where it appears and ending at the filtration level where it disappears. Now, in terms of filtration, we have two possibilities for such a graph: top-to-bottom or bottom-to-top. The top-to-bottom strategy is visually decomposed in the graph above, by showing the barcode construction using the 1D ECG signal. Unfortunately, barcode diagrams are not exploitable by machine-learning straight away, due to the non-uniformity of their dimensions. To get them in a useful form, we transform the diagrams into Betti Curves.

Betti Curve: Let’s consider a persistence barcode, for which we vectorize the radius space. Each component of the barcode diagram is then referred to as a function taking 1 as value on the radius defining it and 0 anywhere else. The sum of those functions on the vectorized space defines the Betti Curve.

For each heartbeat, we now have two Betti Curves, which sizes are uniform and controlled. I personally go for a hundred points vectorization, so that I can reapply previously trained models to similar problematics that take as input similarly distributed curves.

Bring Deep Learning into the Equation

ow comes the question of how to exploit those Betti Curves. Aside from being an exotic data representation, the curves enclose complex information about shape in a one-dimensional signal. The hunt to the best model to exploit those is on!

Exploiting one-dimensional signals is no headache. The only difference is that here, the dimension is not time, but space. Among the choices you have, you may go for kNN with Euclidean or Dynamic Time Warping metrics, for Boosted Trees straight away or use CNN networks, probably the best ones in those circumstances [interpretable by the multiple scales that CNNs have access to, which is probably even more convenient for spatial information than it is for time].

The model I designed at the time is probably more of a bazooka for what I wanna say, but it clearly emphasized the way I conceive the needed modularity most of deep learning problems should be based on. You have the choice of going towards the stacking strategy as Stanford did with their paper [34 layers], or towards a more horizontal multi-modular approach, mixing both the human intuition [feature engineering] and unreachable statistical considerations [deep learning]. This is exactly what the model depicted hereunder is all about: augment the scope of information you deal with.

How impactful is TDA on generalization?

ou will now be wondering whether TDA has actually brought that performance enhancement I was talking about at the beginning of this article. Well, for that part, we spent a huge time running experiments :]

The first step we took was trying to assess the impact of TDA on our deep learning architecture. Two problematics were highlighted: binary classification [~ arrhythmia detection] consisting of spotting arrhythmic beats from normal beats, and multi-class classification consisting of attributing the right label to each abnormal heartbeat. We ran on the basis of a cross-validation framework, keeping patients for training, some for validation and others for testing. Each ID represents one 3-fold split [training 60%, validation 10% and testing 30%, based on the 240 patients available] and a model trained from scratch.

The first conclusion was interesting: even though adding TDA did help for generalization, the most impacted problematic was the actual classification. One way of interpreting this result is to think TDA as a highly specialized measure of topology and shape. The first problem is mainly about detecting abnormal beats, which comes back to anomaly detection, whereas the second one relies more on topological information, as the shape is a huge component of differentiation between abnormal heartbeats.

Of course, this step set aside, we actually dived into comparisons. I attach the results of our general architecture hereunder, but won’t go over all the details in here. [I actually dislike the way scoring systems have been designed in between papers, and based on how fast metrics and considerations have evolved around generalization robustness, it would not even make sense anymore to limit our assessments to those.]

Conclusions are overrated

hope this introduction to applied Topological Data Analysis triggered your interest in the underlying theory, and the unlimited set of problematics it can actually be used for. Using TDA makes most of its sense when shape is intuitively involved in the problematic. Would it be through persistence diagrams, barcodes diagrams, persistence landscapes, persistence silhouettes or Betti curves, TDA can be used to be part of machine learning pipelines. Unfortunately, whatever your level of innovation, the biggest limiting factor is and will be the data. Let’s begin thinking about centralized high-quality sources of data, and then we will certainly change the stakes!

References

Interested in keeping in touch? :D Twitter | LinkedIn | Github | Medium

Video liên quan

Chủ Đề