Is topology needed for analysis?

Topology studies properties of spaces that are invariant under any continuous deformation. It is sometimes called "rubber-sheet geometry" because the objects can be stretched and contracted like rubber, but cannot be broken. For example, a square can be deformed into a circle without breaking it, but a figure 8 cannot. Hence a square is topologically equivalent to a circle, but different from a figure 8.

Here are some examples of typical questions in topology: How many holes are there in an object? How can you define the holes in a torus or sphere? What is the boundary of an object? Is a space connected? Does every continuous function from the space to itself have a fixed point?

Topology is a relatively new branch of mathematics; most of the research in topology has been done since 1900. The following are some of the subfields of topology.

  1. General Topology or Point Set Topology. General topology normally considers local properties of spaces, and is closely related to analysis. It generalizes the concept of continuity to define topological spaces, in which limits of sequences can be considered. Sometimes distances can be defined in these spaces, in which case they are called metric spaces; sometimes no concept of distance makes sense.
  2. Combinatorial Topology. Combinatorial topology considers the global properties of spaces, built up from a network of vertices, edges, and faces. This is the oldest branch of topology, and dates back to Euler. It has been shown that topologically equivalent spaces have the same numerical invariant, which we now call the Euler characteristic. This is the number [V - E + F], where V, E, and F are the number of vertices, edges, and faces of an object. For example, a tetrahedron and a cube are topologically equivalent to a sphere, and any “triangulation” of a sphere will have an Euler characteristic of 2.
  3. Algebraic Topology. Algebraic topology also considers the global properties of spaces, and uses algebraic objects such as groups and rings to answer topological questions. Algebraic topology converts a topological problem into an algebraic problem that is hopefully easier to solve. For example, a group called a homology group can be associated to each space, and the torus and the Klein bottle can be distinguished from each other because they have different homology groups.

Algebraic topology sometimes uses the combinatorial structure of a space to calculate the various groups associated to that space.

  1. Differential Topology. Differential topology considers spaces with some kind of smoothness associated to each point. In this case, the square and the circle would not be smoothly [or differentiably] equivalent to each other. Differential topology is useful for studying properties of vector fields, such as a magnetic or electric fields.

Topology is used in many branches of mathematics, such as differentiable equations, dynamical systems, knot theory, and Riemann surfaces in complex analysis. It is also used in string theory in physics, and for describing the space-time structure of universe.

In applied mathematics, topological based data analysis [TDA] is an approach to the analysis of datasets using techniques from topology. Extraction of information from datasets that are high-dimensional, incomplete and noisy is generally challenging. TDA provides a general framework to analyze such data in a manner that is insensitive to the particular metric chosen and provides dimensionality reduction and robustness to noise. Beyond this, it inherits functoriality, a fundamental concept of modern mathematics, from its topological nature, which allows it to adapt to new mathematical tools.

The initial motivation is to study the shape of data. TDA has combined algebraic topology and other tools from pure mathematics to allow mathematically rigorous study of "shape". The main tool is persistent homology, an adaptation of homology to point cloud data. Persistent homology has been applied to many types of data across many fields. Moreover, its mathematical foundation is also of theoretical importance. The unique features of TDA make it a promising bridge between topology and geometry.

TDA is premised on the idea that the shape of data sets contains relevant information. Real high-dimensional data is typically sparse, and tends to have relevant low dimensional features. One task of TDA is to provide a precise characterization of this fact. For example, the trajectory of a simple predator-prey system governed by the Lotka–Volterra equations[1] forms a closed circle in state space. TDA provides tools to detect and quantify such recurrent motion.[2]

Many algorithms for data analysis, including those used in TDA, require setting various parameters. Without prior domain knowledge, the correct collection of parameters for a data set is difficult to choose. The main insight of persistent homology is to use the information obtained from all parameter values by encoding this huge amount of information into an understandable and easy-to-represent form. With TDA, there is a mathematical interpretation when the information is a homology group. In general, the assumption is that features that persist for a wide range of parameters are "true" features. Features persisting for only a narrow range of parameters are presumed to be noise, although the theoretical justification for this is unclear.[3]

Early history

Precursors to the full concept of persistent homology appeared gradually over time.[4] In 1990, Patrizio Frosini introduced the size function, which is equivalent to the 0th persistent homology.[5] Nearly a decade later, Vanessa Robins studied the images of homomorphisms induced by inclusion.[6] Finally, shortly thereafter, Edelsbrunner et al. introduced the concept of persistent homology together with an efficient algorithm and its visualization as a persistence diagram.[7] Carlsson et al. reformulated the initial definition and gave an equivalent visualization method called persistence barcodes,[8] interpreting persistence in the language of commutative algebra.[9]

In algebraic topology the persistent homology has emerged through the work of Sergey Barannikov on Morse theory. The set of critical values of smooth Morse function was canonically partitioned into pairs "birth-death", filtered complexes were classified, their invariants, equivalent to persistence diagram and persistence barcodes, together with the efficient algorithm for their calculation, were described under the name of canonical forms in 1994 by Barannikov.[10][11]

Concepts

Some widely used concepts are introduced below. Note that some definitions may vary from author to author.

A point cloud is often defined as a finite set of points in some Euclidean space, but may be taken to be any finite metric space.

The Čech complex of a point cloud is the nerve of the cover of balls of a fixed radius around each point in the cloud.

A persistence module U {\displaystyle \mathbb {U} }   indexed by Z {\displaystyle \mathbb {Z} }   is a vector space U t {\displaystyle U_{t}}   for each t ∈ Z {\displaystyle t\in \mathbb {Z} }  , and a linear map u t s : U s → U t {\displaystyle u_{t}^{s}:U_{s}\to U_{t}}   whenever s ≤ t {\displaystyle s\leq t}  , such that u t t = 1 {\displaystyle u_{t}^{t}=1}   for all t {\displaystyle t}   and u t s u s r = u t r {\displaystyle u_{t}^{s}u_{s}^{r}=u_{t}^{r}}   whenever r ≤ s ≤ t . {\displaystyle r\leq s\leq t.}  [12] An equivalent definition is a functor from Z {\displaystyle \mathbb {Z} }   considered as a partially ordered set to the category of vector spaces.

The persistent homology group P H {\displaystyle PH}   of a point cloud is the persistence module defined as P H k [ X ] = ∏ H k [ X r ] {\displaystyle PH_{k}[X]=\prod H_{k}[X_{r}]}  , where X r {\displaystyle X_{r}}   is the Čech complex of radius r {\displaystyle r}   of the point cloud X {\displaystyle X}   and H k {\displaystyle H_{k}}   is the homology group.

A persistence barcode is a multiset of intervals in R {\displaystyle \mathbb {R} }  , and a persistence diagram is a multiset of points in Δ {\displaystyle \Delta }  [ := { [ u , v ] ∈ R 2 ∣ u , v ≥ 0 , u ≤ v } {\displaystyle :=\{[u,v]\in \mathbb {R} ^{2}\mid u,v\geq 0,u\leq v\}}  ].

The Wasserstein distance between two persistence diagrams X {\displaystyle X}   and Y {\displaystyle Y}   is defined as

W p [ L q ] [ X , Y ] := inf φ : X → Y [ ∑ x ∈ X [ ‖ x − φ [ x ] ‖ q ] p ] 1 / p {\displaystyle W_{p}[L_{q}][X,Y]:=\inf _{\varphi :X\to Y}\left[\sum _{x\in X}[\Vert x-\varphi [x]\Vert _{q}]^{p}\right]^{1/p}}

 

where 1 ≤ p , q ≤ ∞ {\displaystyle 1\leq p,q\leq \infty }   and φ {\displaystyle \varphi }   ranges over bijections between X {\displaystyle X}   and Y {\displaystyle Y}  . Please refer to figure 3.1 in Munch [13] for illustration.

The bottleneck distance between X {\displaystyle X}   and Y {\displaystyle Y}   is

W ∞ [ L q ] [ X , Y ] := inf φ : X → Y sup x ∈ X ‖ x − φ [ x ] ‖ q . {\displaystyle W_{\infty }[L_{q}][X,Y]:=\inf _{\varphi :X\to Y}\sup _{x\in X}\Vert x-\varphi [x]\Vert _{q}.}

 

This is a special case of Wasserstein distance, letting p = ∞ {\displaystyle p=\infty }  .

Basic property

Structure theorem

The first classification theorem for persistent homology appeared in 1994[10] via Barannikov's canonical forms. The classification theorem interpreting persistence in the language of commutative algebra appeared in 2005:[9] for a finitely generated persistence module C {\displaystyle C}   with field F {\displaystyle F}   coefficients,

H [ C ; F ] ≃ ⨁ i x t i ⋅ F [ x ] ⊕ [ ⨁ j x r j ⋅ [ F [ x ] / [ x s j ⋅ F [ x ] ] ] ] . {\displaystyle H[C;F]\simeq \bigoplus _{i}x^{t_{i}}\cdot F[x]\oplus \left[\bigoplus _{j}x^{r_{j}}\cdot [F[x]/[x^{s_{j}}\cdot F[x]]]\right].}

 

Intuitively, the free parts correspond to the homology generators that appear at filtration level t i {\displaystyle t_{i}}   and never disappear, while the torsion parts correspond to those that appear at filtration level r j {\displaystyle r_{j}}   and last for s j {\displaystyle s_{j}}   steps of the filtration [or equivalently, disappear at filtration level s j + r j {\displaystyle s_{j}+r_{j}}  ].[10]

Persistent homology is visualized through a barcode or persistence diagram. The barcode has its root in abstract mathematics. Namely, the category of finite filtered complexes over a field is semi-simple. Any filtered complex is isomorphic to its canonical form, a direct sum of one- and two-dimensional simple filtered complexes.

Stability

Stability is desirable because it provides robustness against noise. If X {\displaystyle X}   is any space which is homeomorphic to a simplicial complex, and f , g : X → R {\displaystyle f,g:X\to \mathbb {R} }   are continuous tame[14] functions, then the persistence vector spaces { H k [ f − 1 [ [ 0 , r ] ] ] } {\displaystyle \{H_{k}[f^{-1}[[0,r]]]\}}   and { H k [ g − 1 [ [ 0 , r ] ] ] } {\displaystyle \{H_{k}[g^{-1}[[0,r]]]\}}   are finitely presented, and W ∞ [ D [ f ] , D [ g ] ] ≤ ‖ f − g ‖ ∞ {\displaystyle W_{\infty }[D[f],D[g]]\leq \lVert f-g\rVert _{\infty }}  , where W ∞ {\displaystyle W_{\infty }}   refers to the bottleneck distance[15] and D {\displaystyle D}   is the map taking a continuous tame function to the persistence diagram of its k {\displaystyle k}  -th homology.

Workflow

The basic workflow in TDA is:[16]

point cloud → {\displaystyle \to }   nested complexes → {\displaystyle \to }   persistence module → {\displaystyle \to }   barcode or diagram
  1. If X {\displaystyle X}   is a point cloud, replace X {\displaystyle X}   with a nested family of simplicial complexes X r {\displaystyle X_{r}}   [such as the Čech or Vietoris-Rips complex]. This process converts the point cloud into a filtration of simplicial complexes. Taking the homology of each complex in this filtration gives a persistence module

    H i [ X r 0 ] → H i [ X r 1 ] → H i [ X r 2 ] → ⋯ {\displaystyle H_{i}[X_{r_{0}}]\to H_{i}[X_{r_{1}}]\to H_{i}[X_{r_{2}}]\to \cdots }

     

  2. Apply the structure theorem to provide a parameterized version of Betti number, persistence diagram, or equivalently, barcode.

Graphically speaking,

 

A usual use of persistence in TDA [17]

The first algorithm over all fields for persistent homology in algebraic topology setting was described by Barannikov[10] through reduction to the canonical form by upper-triangular matrices. The first algorithm for persistent homology over F 2 {\displaystyle F_{2}}   was given by Edelsbrunner et al.[7] Zomorodian and Carlsson gave the first practical algorithm to compute persistent homology over all fields.[9] Edelsbrunner and Harer's book gives general guidance on computational topology.[18]

One issue that arises in computation is the choice of complex. The Čech complex and Vietoris–Rips complex are most natural at first glance; however, their size grows rapidly with the number of data points. The Vietoris–Rips complex is preferred over Čech complex because its definition is simpler and the Čech complex requires extra effort to define in a general finite metric space. Efficient ways to lower the computational cost of homology have been studied. For example, the α-complex and witness complex are used to reduce the dimension and size of complexes.[19]

Recently, Discrete Morse theory has shown promise for computational homology because it can reduce a given simplicial complex to a much smaller cellular complex which is homotopic to the original one.[20] This reduction can in fact be performed as the complex is constructed by using matroid theory, leading to further performance increases.[21] Another recent algorithm saves time by ignoring the homology classes with low persistence.[22]

Various software packages are available, such as javaPlex, Dionysus, Perseus, PHAT[permanent dead link], DIPHA, GUDHI, Ripser, and TDAstats. A comparison between these tools is done by Otter et al.[23] Giotto-tda is a Python package dedicated to integrating TDA in the machine learning workflow by means of a scikit-learn API. An R package TDA is capable of calculating recently invented concepts like landscape and the kernel distance estimator.[24] The Topology ToolKit is specialized for continuous data defined on manifolds of low dimension [1, 2 or 3], as typically found in scientific visualization. Another R package, TDAstats, implements the Ripser library to calculate persistent homology.[25]

High-dimensional data is impossible to visualize directly. Many methods have been invented to extract a low-dimensional structure from the data set, such as principal component analysis and multidimensional scaling.[26] However, it is important to note that the problem itself is ill-posed, since many different topological features can be found in the same data set. Thus, the study of visualization of high-dimensional spaces is of central importance to TDA, although it does not necessarily involve the use of persistent homology. However, recent attempts have been made to use persistent homology in data visualization.[27]

Carlsson et al. have proposed a general method called MAPPER.[28] It inherits the idea of Serre that a covering preserves homotopy.[29] A generalized formulation of MAPPER is as follows:

Let X {\displaystyle X}   and Z {\displaystyle Z}   be topological spaces and let f : X → Z {\displaystyle f:X\to Z}   be a continuous map. Let U = { U α } α ∈ A {\displaystyle \mathbb {U} =\{U_{\alpha }\}_{\alpha \in A}}   be a finite open covering of Z {\displaystyle Z}  . The output of MAPPER is the nerve of the pullback cover M [ U , f ] := N [ f − 1 [ U ] ] {\textstyle M[\mathbb {U} ,f]:=N[f^{-1}[\mathbb {U} ]]}  , where each preimage is split into its connected components.[27] This is a very general concept, of which the Reeb graph [30] and merge trees are special cases.

This is not quite the original definition.[28] Carlsson et al. choose Z {\displaystyle Z}   to be R {\displaystyle \mathbb {R} }   or R 2 {\displaystyle \mathbb {R} ^{2}}  , and cover it with open sets such that at most two intersect.[3] This restriction means that the output is in the form of a complex network. Because the topology of a finite point cloud is trivial, clustering methods [such as single linkage] are used to produce the analogue of connected sets in the preimage f − 1 [ U ] {\displaystyle f^{-1}[U]}   when MAPPER is applied to actual data.

Mathematically speaking, MAPPER is a variation of the Reeb graph. If the M [ U , f ] {\textstyle M[\mathbb {U} ,f]}   is at most one dimensional, then for each i ≥ 0 {\displaystyle i\geq 0}  ,

H i [ X ] ≃ H 0 [ N [ U ] ; F ^ i ] ⊕ H 1 [ N [ U ] ; F ^ i − 1 ] . {\displaystyle H_{i}[X]\simeq H_{0}[N[\mathbb {U} ];{\hat {F}}_{i}]\oplus H_{1}[N[\mathbb {U} ];{\hat {F}}_{i-1}].}

 

[31] The added flexibility also has disadvantages. One problem is instability, in that some change of the choice of the cover can lead to major change of the output of the algorithm.[32] Work has been done to overcome this problem.[27]

Three successful applications of MAPPER can be found in Carlsson et al.[33] A comment on the applications in this paper by J. Curry is that "a common feature of interest in applications is the presence of flares or tendrils."[34]

A free implementation of MAPPER is available online written by Daniel Müllner and Aravindakshan Babu. MAPPER also forms the basis of Ayasdi's AI platform.

Multidimensional persistence

Multidimensional persistence is important to TDA. The concept arises in both theory and practice. The first investigation of multidimensional persistence was early in the development of TDA,[35]. Carlsson-Zomorodian introduced the theory of multidimensional persistence in [36] and in collaboration with Singh [37] introduced the use of tools from symbolic algebra [Grobner basis methods] to compute MPH modules. Their definition presents multidimensional persistence with n parameters as a Z^n graded module over a polynomial ring in n variables. Tools from commutative and homological algebra are applied to the study of multidimensional persistence in work of Harrington-Otter-Schenck-Tillman.[38] The first application to appear in the literature is a method for shape comparison, similar to the invention of TDA.[39]

The definition of an n-dimensional persistence module in R n {\displaystyle \mathbb {R} ^{n}}   is[34]

  • vector space V s {\displaystyle V_{s}}   is assigned to each point in s = [ s 1 , . . . , s n ] {\displaystyle s=[s_{1},...,s_{n}]}  
  • map ρ s t : V s → V t {\displaystyle \rho _{s}^{t}:V_{s}\to V_{t}}   is assigned if s ≤ t {\displaystyle s\leq t}  [ s i ≤ t i , i = 1 , . . . , n ] {\displaystyle s_{i}\leq t_{i},i=1,...,n]}  
  • maps satisfy ρ r t = ρ s t ∘ ρ r s {\displaystyle \rho _{r}^{t}=\rho _{s}^{t}\circ \rho _{r}^{s}}   for all r ≤ s ≤ t {\displaystyle r\leq s\leq t}  

It might be worth noting that there are controversies on the definition of multidimensional persistence.[34]

One of the advantages of one-dimensional persistence is its representability by a diagram or barcode. However, discrete complete invariants of multidimensional persistence modules do not exist.[40] The main reason for this is that the structure of the collection of indecomposables is extremely complicated by Gabriel's theorem in the theory of quiver representations,[41] although a finitely n-dim persistence module can be uniquely decomposed into a direct sum of indecomposables due to the Krull-Schmidt theorem.[42]

Nonetheless, many results have been established. Carlsson and Zomorodian introduced the rank invariant ρ M [ u , v ] {\displaystyle \rho _{M}[u,v]}  , defined as the ρ M [ u , v ] = r a n k [ x u − v : M u → M v ] {\displaystyle \rho _{M}[u,v]=\mathrm {rank} [x^{u-v}:M_{u}\to M_{v}]}  , in which M {\displaystyle M}   is a finitely generated n-graded module. In one dimension, it is equivalent to the barcode. In the literature, the rank invariant is often referred as the persistent Betti numbers [PBNs].[18] In many theoretical works, authors have used a more restricted definition, an analogue from sublevel set persistence. Specifically, the persistence Betti numbers of a function f : X → R k {\displaystyle f:X\to \mathrm {R} ^{k}}   are given by the function β f : Δ + → N {\displaystyle \beta _{f}:\Delta ^{+}\to \mathrm {N} }  , taking each [ u , v ] ∈ Δ + {\displaystyle [u,v]\in \Delta ^{+}}   to β f [ u , v ] := r a n k [ H [ X [ f ≤ u ] → H [ X [ f ≤ v ] ] ] {\displaystyle \beta _{f}[u,v]:=\mathrm {rank} [H[X[f\leq u]\to H[X[f\leq v]]]}  , where Δ + := { [ u , v ] ∈ R × R : u ≤ v } {\displaystyle \Delta ^{+}:=\{[u,v]\in \mathbb {R} \times \mathbb {R} :u\leq v\}}   and X [ f ≤ u ] := { x ∈ X : f [ x ] ≤ u } {\displaystyle X[f\leq u]:=\{x\in X:f[x]\leq u\}}  .

Some basic properties include monotonicity and diagonal jump.[43] Persistent Betti numbers will be finite if X {\displaystyle X}   is a compact and locally contractible subspace of R n {\displaystyle \mathbb {R} ^{n}}  .[44]

Using a foliation method, the k-dim PBNs can be decomposed into a family of 1-dim PBNs by dimensionality deduction.[45] This method has also led to a proof that multi-dim PBNs are stable.[46] The discontinuities of PBNs only occur at points [ u , v ] [ u ≤ v ] {\displaystyle [u,v][u\leq v]}   where either u {\displaystyle u}   is a discontinuous point of ρ M [ ⋆ , v ] {\displaystyle \rho _{M}[\star ,v]}   or v {\displaystyle v}   is a discontinuous point of ρ [ u , ⋆ ] {\displaystyle \rho [u,\star ]}   under the assumption that f ∈ C 0 [ X , R k ] {\displaystyle f\in C^{0}[X,\mathrm {R} ^{k}]}   and X {\displaystyle X}   is a compact, triangulable topological space.[47]

Persistent space, a generalization of persistent diagram, is defined as the multiset of all points with multiplicity larger than 0 and the diagonal.[48] It provides a stable and complete representation of PBNs. An ongoing work by Carlsson et al. is trying to give geometric interpretation of persistent homology, which might provide insights on how to combine machine learning theory with topological data analysis.[49]

The first practical algorithm to compute multidimensional persistence was invented very early.[50] After then, many other algorithms have been proposed, based on such concepts as discrete morse theory[51] and finite sample estimating.[52]

Other persistences

The standard paradigm in TDA is often referred as sublevel persistence. Apart from multidimensional persistence, many works have been done to extend this special case.

Zigzag persistence

The nonzero maps in persistence module are restricted by the preorder relationship in the category. However, mathematicians have found that the unanimity of direction is not essential to many results. "The philosophical point is that the decomposition theory of graph representations is somewhat independent of the orientation of the graph edges".[53] Zigzag persistence is important to the theoretical side. The examples given in Carlsson's review paper to illustrate the importance of functorality all share some of its features.[3]

Extended persistence and levelset persistence

Some attempts is to lose the stricter restriction of the function.[54] Please refer to the Categorification and cosheaves and Impact on mathematics sections for more information.

It's natural to extend persistence homology to other basic concepts in algebraic topology, such as cohomology and relative homology/cohomology.[55] An interesting application is the computation of circular coordinates for a data set via the first persistent cohomology group.[56]

Circular persistence

Normal persistence homology studies real-valued functions. The circle-valued map might be useful, "persistence theory for circle-valued maps promises to play the role for some vector fields as does the standard persistence theory for scalar fields", as commented in Dan Burghelea et al.[57] The main difference is that Jordan cells [very similar in format to the Jordan blocks in linear algebra] are nontrivial in circle-valued functions, which would be zero in real-valued case, and combining with barcodes give the invariants of a tame map, under moderate conditions.[57]

Two techniques they use are Morse-Novikov theory[58] and graph representation theory.[59] More recent results can be found in D. Burghelea et al.[60] For example, the tameness requirement can be replaced by the much weaker condition, continuous.

Persistence with torsion

The proof of the structure theorem relies on the base domain being field, so not many attempts have been made on persistence homology with torsion. Frosini defined a pseudometric on this specific module and proved its stability.[61] One of its novelty is that it doesn't depend on some classification theory to define the metric.[62]

Categorification and cosheaves

One advantage of category theory is its ability to lift concrete results to a higher level, showing relationships between seemingly unconnected objects. Bubenik et al.[63] offers a short introduction of category theory fitted for TDA.

Category theory is the language of modern algebra, and has been widely used in the study of algebraic geometry and topology. It has been noted that "the key observation of [9] is that the persistence diagram produced by [7] depends only on the algebraic structure carried by this diagram."[64] The use of category theory in TDA has proved to be fruitful.[63][64]

Following the notations made in Bubenik et al.,[64] the indexing category P {\textstyle P}   is any preordered set [not necessarily N {\displaystyle \mathbb {N} }   or R {\displaystyle \mathbb {R} }  ], the target category D {\displaystyle D}   is any category [instead of the commonly used V e c t F {\textstyle \mathrm {Vect} _{\mathbb {F} }}  ], and functors P → D {\textstyle P\to D}   are called generalized persistence modules in D {\displaystyle D}  , over P {\textstyle P}  .

One advantage of using category theory in TDA is a clearer understanding of concepts and the discovery of new relationships between proofs. Take two examples for illustration. The understanding of the correspondence between interleaving and matching is of huge importance, since matching has been the method used in the beginning [modified from Morse theory]. A summary of works can be found in Vin de Silva et al.[65] Many theorems can be proved much more easily in a more intuitive setting.[62] Another example is the relationship between the construction of different complexes from point clouds. It has long been noticed that Čech and Vietoris-Rips complexes are related. Specifically, V r [ X ] ⊂ C 2 r [ X ] ⊂ V 2 r [ X ] {\displaystyle V_{r}[X]\subset C_{{\sqrt {2}}r}[X]\subset V_{2r}[X]}  .[66] The essential relationship between Cech and Rips complexes can be seen much more clearly in categorical language.[65]

The language of category theory also helps cast results in terms recognizable to the broader mathematical community. Bottleneck distance is widely used in TDA because of the results on stability with respect to the bottleneck distance.[12][15] In fact, the interleaving distance is the terminal object in a poset category of stable metrics on multidimensional persistence modules in a prime field.[62][67]

Sheaves, a central concept in modern algebraic geometry, are intrinsically related to category theory. Roughly speaking, sheaves are the mathematical tool for understanding how local information determines global information. Justin Curry regards level set persistence as the study of fibers of continuous functions. The objects that he studies are very similar to those by MAPPER, but with sheaf theory as the theoretical foundation.[34] Although no breakthrough in the theory of TDA has yet used sheaf theory, it is promising since there are many beautiful theorems in algebraic geometry relating to sheaf theory. For example, a natural theoretical question is whether different filtration methods result in the same output.[68]

Stability

Stability is of central importance to data analysis, since real data carry noises. By usage of category theory, Bubenik et al. have distinguished between soft and hard stability theorems, and proved that soft cases are formal.[64] Specifically, general workflow of TDA is

data ⟶ F {\displaystyle {\stackrel {F}{\longrightarrow }}}   topological persistence module ⟶ H {\displaystyle {\stackrel {H}{\longrightarrow }}}   algebraic persistence module ⟶ J {\displaystyle {\stackrel {J}{\longrightarrow }}}   discrete invariant

The soft stability theorem asserts that H F {\displaystyle HF}   is Lipschitz continuous, and the hard stability theorem asserts that J {\displaystyle J}   is Lipschitz continuous.

Bottleneck distance is widely used in TDA. The isometry theorem asserts that the interleaving distance d I {\displaystyle d_{I}}   is equal to the bottleneck distance.[62] Bubenik et al. have abstracted the definition to that between functors F , G : P → D {\displaystyle F,G:P\to D}   when P {\textstyle P}   is equipped with a sublinear projection or superlinear family, in which still remains a pseudometric.[64] Considering the magnificent characters of interleaving distance,[69] here we introduce the general definition of interleaving distance[instead of the first introduced one]:[12] Let Γ , K ∈ T r a n s P {\displaystyle \Gamma ,K\in \mathrm {Trans_{P}} }   [a function from P {\textstyle P}   to P {\textstyle P}   which is monotone and satisfies x ≤ Γ [ x ] {\displaystyle x\leq \Gamma [x]}   for all x ∈ P {\textstyle x\in P}  ]. A [ Γ , K ] {\displaystyle [\Gamma ,K]}  -interleaving between F and G consists of natural transformations φ : F ⇒ G Γ {\displaystyle \varphi \colon F\Rightarrow G\Gamma }   and ψ : G ⇒ F K {\displaystyle \psi :G\Rightarrow FK}  , such that [ ψ Γ ] = φ F η K Γ {\displaystyle [\psi \Gamma ]=\varphi F\eta _{K\Gamma }}   and [ φ Γ ] = ψ G η Γ K {\displaystyle [\varphi \Gamma ]=\psi G\eta _{\Gamma K}}  .

The two main results are[64]

  • Let P {\textstyle P}   be a preordered set with a sublinear projection or superlinear family. Let H : D → E {\textstyle H:D\to E}   be a functor between arbitrary categories D , E {\textstyle D,E}  . Then for any two functors F , G : P → D {\textstyle F,G:P\to D}  , we have d I [ H F , H G ] ≤ d I [ F , G ] {\textstyle d_{I}[HF,HG]\leq d_{I}[F,G]}  .
  • Let P {\textstyle P}   be a poset of a metric space Y {\textstyle Y}   , X {\textstyle X}   be a topological space. And let f , g : X → Y {\textstyle f,g:X\to Y}   [not necessarily continuous] be functions, and F , G {\textstyle F,G}   to be the corresponding persistence diagram. Then d I [ F , G ] ≤ d ∞ [ f , g ] := sup x ∈ X d Y [ f [ x ] , g [ x ] ] {\displaystyle d_{I}[F,G]\leq d_{\infty }[f,g]:=\sup _{x\in X}d_{Y}[f[x],g[x]]}  .

These two results summarize many results on stability of different models of persistence.

For the stability theorem of multidimensional persistence, please refer to the subsection of persistence.

Structure theorem

The structure theorem is of central importance to TDA; as commented by G. Carlsson, "what makes homology useful as a discriminator between topological spaces is the fact that there is a classification theorem for finitely generated abelian groups."[3] [see the fundamental theorem of finitely generated abelian groups].

The main argument used in the proof of the original structure theorem is the standard structure theorem for finitely generated modules over a principal ideal domain.[9] However, this argument fails if the indexing set is [ R , ≤ ] {\displaystyle [\mathbb {R} ,\leq ]}  .[3]

In general, not every persistence module can be decomposed into intervals.[70] Many attempts have been made at relaxing the restrictions of the original structure theorem.[clarification needed] The case for pointwise finite-dimensional persistence modules indexed by a locally finite subset of R {\displaystyle \mathbb {R} }   is solved based on the work of Webb.[71] The most notable result is done by Crawley-Boevey, which solved the case of R {\displaystyle \mathbb {R} }  . Crawley-Boevey's theorem states that any pointwise finite-dimensional persistence module is a direct sum of interval modules.[72]

To understand the definition of his theorem, some concepts need introducing. An interval in [ R , ≤ ] {\displaystyle [\mathbb {R} ,\leq ]}   is defined as a subset I ⊂ R {\displaystyle I\subset \mathbb {R} }   having the property that if r , t ∈ I {\displaystyle r,t\in I}   and if there is an s ∈ R {\displaystyle s\in \mathbb {R} }   such that r ≤ s ≤ t {\displaystyle r\leq s\leq t}  , then s ∈ I {\displaystyle s\in I}   as well. An interval module k I {\displaystyle k_{I}}   assigns to each element s ∈ I {\displaystyle s\in I}   the vector space k {\displaystyle k}   and assigns the zero vector space to elements in R ∖ I {\displaystyle \mathbb {R} \setminus I}  . All maps ρ s t {\displaystyle \rho _{s}^{t}}   are the zero map, unless s , t ∈ I {\displaystyle s,t\in I}   and s ≤ t {\displaystyle s\leq t}  , in which case ρ s t {\displaystyle \rho _{s}^{t}}   is the identity map.[34] Interval modules are indecomposable.[73]

Although the result of Crawley-Boevey is a very powerful theorem, it still doesn't extend to the q-tame case.[70] A persistence module is q-tame if the rank of ρ s t {\displaystyle \rho _{s}^{t}}   is finite for all s < t {\displaystyle s

Chủ Đề