[PPT] - Concurrence Topology: A Tool for Describing High-Order Statistical PowerPoint Presentation

SLIDE 1

“Concurrence Topology:” A Tool for Describing High-Order Statistical Dependence in Data

Steven P. Ellis (mostly joint work with Arno Klein) 6/9/’14

SLIDE 2

Abstract

Data analytic methods possessing the following three features are desirable: (1) The method describes ”high-order dependence” among variables. (2) It does so with few preconceptions. And (3) it can handle at least dozens, maybe hundreds of variables. However, if approached in a naive fashion, data analysis having these three features triggers a ”combinatorial explosion”: The output from the analysis can include thousands, maybe millions of numbers. Few methods exist possessing all three features yet which avoid the combinatorial explosion. Ellis has devised a data analytic method he calls ”Concurrence Topology (CT)” which does so.

SLIDE 3

Abstract, continued

CT takes an apparently radically new approach to solving solving this problem. It starts by translating data into a ”filtration”, a series of ”shapes”. The shapes in the series are called ”frames”. A filtration is like a building. The frames are like floors of the building. But while the floors of a building are two-dimensional, the frames of a filtration can have dimension much higher than two. A filtration can have holes that are like elevator shafts in a building. Such holes indicate relatively weak or negative association among the variables. CT uses computational algebraic topology to describe the pattern

f holes. Normally, there are no more than a few dozen

holes, so CT avoids the combinatorial explosion. Often

ne can identify small groups of variables that are closely

associated with a given hole. This process facilitates interpretation of the hole.

SLIDE 4

Abstract, continued

A limitation of CT is that, so far, it only works with binary data. But quantitative data can always be binarized. Ellis wrote software in R (available upon request) implementing CT. A paper, written by Arno Klein and Ellis, introducing CT and demonstrating it on fMRI data has been accepted by a topology journal.

SLIDE 5

◮ Free R code exists that implements the procedures described

in this talk.

◮ Reference: S.P. Ellis, A. Klein (2014) “Describing high-order

statistical dependence using ‘concurrence topology’, with application to functional MRI brain data,” Homology, Homotopy, and Applications, 16, 245–264.

SLIDE 6

CONCERNED WITH DATA ANALYSIS CHARACTERIZED BY THREE FEATURES

SLIDE 7

INGREDIENT 1: HIGH-ORDER DEPENDENCE

◮ A statistic that can be computed from a multivariate sample

by looking at only k variables at a time, but which cannot be

btained by looking at fewer than k variables at a time

reflects “kth order dependence” among the variables.

◮ “High-order dependence” means dependence of order at least

3.

SLIDE 8

Examples:

◮ The list of means of 10 variables reflects “first order

dependence”.

◮ A correlation matrix of 10 variables reflects second order

dependence.

◮ A simple network model reflects second order dependence. ◮ Factor analysis reflects second order dependence.

SLIDE 9

Regression

◮ Least squares estimates of the coefficients in a regression

model Y = β0 + β1X1 + β2X2 + β3X3 +β4X4 + β5X5 + error reflects second order dependence.

◮ The least squares estimate of the interaction β12 in the

regression model Y = β0 + β1X1 + β2X2 +β12X1 : X2 + error reflects third order dependence.

◮ Interactions in regression models can be important. ◮ This suggests that looking at dependence of order higher than

2 might be useful in general.

SLIDE 10

Three data sets identical (statistically) up to 2nd order, but not at third order.

I II III x y z x y z x y z 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

SLIDE 11

SUMMARIZING

◮ Typically, there are very many ways variables can be

dependent.

◮ Any summary of dependence cannot capture every sort of

dependence.

◮ Correlation.

SLIDE 12

INGREDIENT 2: “AGNOSTIC” STATISTICS

◮ Typically, formulating a regression model involves choices.

◮ Which variable should be the response (dependent) variable? ◮ Which variables should be the predictors (independent

variables)?

◮ Which variables should be included in interactions?

◮ Ditto for path analysis ◮ If you have prior knowledge to guide you, then regression

modeling (or path analysis) is a powerful way to learn from data.

◮ A more data-driven approach is “agnostic analysis”:

◮ Treating all variables the same a priori.

◮ Example: Factor analysis is a second order agnostic analysis

method.

SLIDE 13

Group variables

◮ I’m mostly interested in “unsupervised” methods. ◮ But if there is a variable that specifies classes or groups that

the data come from, then one might not want to treat it like any old variable.

◮ Output of unsupervised methods can be used as part of input

to supervised methods.

◮ Give examples later.

SLIDE 14

INGREDIENT 3: “LARGE” NUMBER OF VARIABLES

◮ In this talk “large number” means “dozens”, maybe a

hundred or so.

SLIDE 15

“COMBINATORIAL EXPLOSION”

◮ The three features constitute an “explosive mixture”. ◮ Prima facie, agnostically describing kth-order dependence in a

data set means examining all combinations of k variables at a time.

◮ If there are many variables and k > 2, the number of

combinations can be huge.

◮ Sometimes the collection of all these combinations can be

regarded as a “haystack” in which we’re searching for “needles”.

SLIDE 16

“COMBINATORIAL EXPLOSION:” EXAMPLE

Analysis of seventh-order dependence among the regions of the brain “default mode network” in an fMRI data set.

◮ 32 variables. ◮ Naive agnostic seventh order analysis of 32 variables means

looking at 32

7

= 3, 365, 856 combinations of 7 variables.

◮ E.g., ≥ 3, 365, 856 terms in a “log linear model”.

◮ Data contained only 6,144 numbers.

SLIDE 17

“COMBINATORIAL EXPLOSION,” continued

◮ Computing and interpreting many many combinations is very

challenging.

◮ With many combinations, looking at individual combinations

f k variables is not helpful:

◮ Difficult to interpret a torrent of numbers. ◮ When there are many groups of k variables, behavior of

individual groups is unlikely to be reproducible.

◮ (Multiple comparisons)

SLIDE 18

SOME METHODS THAT AGNOSTICALLY CAPTURE HIGH ORDER DEPENDENCE IN MANY VARIABLES

SLIDE 19

“Unsupervised” methods:

◮ There seem to be few established unsupervised methods that

capture high order dependence.

◮ Independent Components Analysis ◮ Tensor based methods:

◮ “Parallel factor analysis” ◮ “Tucker 3” ◮ Only go up to third order dependence?

SLIDE 20

“Supervised” methods

◮ Many machine learning classification methods tap into high

rder dependence.

SLIDE 21

Experimental methods.

SLIDE 22

CONCURRENCE TOPOLOGY (CT)

◮ Apparently new “unsupervised” method for high-order

agnostic analysis of dependence among dozens (hundreds?) of variables.

◮ CT is radically different from methods mentioned above. ◮ Since there are few methods there’s no need to choose among

them: “Use all of them.”

◮ So comparing methods to see which is best is not urgent.

SLIDE 23

◮ The germ of the idea for CT came from a theoretical

neuroscience talk I heard by the mathematician Carina Curto.

SLIDE 24

CONCURRENCE TOPOLOGY (CT), continued

◮ CT is often able to extract a moderate number of high order

statistics from a combinatorial explosion.

◮ CT detects certain forms of negative or weak association

among the variables.

SLIDE 25

TOPOLOGY

SLIDE 26

TOPOL

SLIDE 27

TOPOLOGY, continued

◮ Topology is study of qualitative aspects of shapes. ◮ Quantitative aspects of shapes such as length, angle, area,

volume, curvature are only loosely connected to topology.

◮ Famously, topology can’t tell the difference between a donut

and a coffee cup.

◮ Topology does pay attenton to holes in shapes (like hole in

donut or in the handle of coffee cup).

◮ Topology ignores details.

◮ That’s good: There’s a combinatorial explosion of details. We

have to ignore practically all of them.

◮ That’s bad: Sometimes the details are important. ◮ “Needles” are details. ◮ But often we can recover details from a CT analysis.

SLIDE 28

ANALOGY FOR CT

◮ Consider this hypothetical histogram.

2 4 6 8 10 12 14 A B C

Persistence in a histogram

SLIDE 29

ANALOGY, continued

◮ Y axis is “count” or “frequency”.

◮ It’s discrete: 1, 2, 3, . . . .

◮ Cut the histogram at various heights.

◮ Suffices to do it at whole number heights ◮ “Frequency levels” ◮ Dark line segments show intersections of horizontal lines with

the histogram.

◮ As horizontal line moves downward, sometimes a gap appears

in the intersection.

◮ A gap is “born”.

◮ At a lower level the gap might be filled in.

◮ The gap “dies”.

◮ Difference in 2 heights is the “lifespan” of the gap.

SLIDE 30

PERSISTENCE

◮ The phenomenon of birth and death of gaps is “persistence”. ◮ Can plot it (“persistence plot”)

2 4 6 8 10 12 2 4 6 8 10 12 death A B C

Persistence plot in dimension 0 for histogram

SLIDE 31

CONCURRENCE TOPOLOGY RUDIMENTS

◮ In CT a group of dichotomous, i.e. “0-1”, variables is

represented as a series of shapes (“filtration”).

◮ A filtration can be thought of as a “building”. ◮ Various shapes in the filtration are like “floors” in the

building.

◮ But the floors of a building are 2-D, while the shapes in a

filtration can be high-D.

◮ The building is made out of “bricks”(simplices), one brick per

“observation”.

◮ time point in fMRI data ◮ subject in psychological scale data

◮ Each observation (subject or time point) contributes one

“brick”.

◮ A “brick” represents a “concurrence”: The variables that are

“1” in the observation.

SLIDE 32

50 100 150 10 20 30 40 50 60 70 concurrence ROI

Concurrence plot for sub84371

SLIDE 33

HOLES

◮ Holes in the filtration indicate relative weakness or negativity

in joint distribution of variables.

◮ Holes in filtration are like “stairwells” or “atriums” in the

building.

◮ They can span several “floors”.

◮ The number of “floors” spanned by a hole is the “lifespan” or

“persistence” of the hole.

◮ In concurrence topology one learns about data by studying the

pattern of holes in the data’s filtration.

SLIDE 34

CANNOT ANALYZE HOLES BY VISUAL INSPECTION

◮ Dimension of filtration is usually too high. ◮ Use computational topology to study pattern of holes.

◮ Branch of topology that studies holes in shapes is “homology

theory”.

◮ Technical word for “hole” is “homology class”.

SLIDE 35

DIMENSION OF HOLES

◮ Homology classes come in different “dimensions” : 0, 1, 2, . . . ◮ Examples:

◮ Gap between Earth and Moon. ◮ Bagel ◮ Basketball ◮ Higher dimensional holes. ◮ Later we’ll look at homology in fMRI data in dimensions 0

through 5.

SLIDE 36

“PERSISTENT HOMOLOGY”

◮ Finding “stairwells” and their spans means computing the

“persistent homology” of the filtration (“building”).

◮ Have done this for up to 74 variables (fMRI data). ◮ May be possible for a few hundred, but not thousands, of

variables.

◮ A hole of dimension d has to do with statistical dependence of

rder at least d + 2.

SLIDE 37

SYNTHETIC EXAMPLE

◮ Test pattern for software.

1 2 3 4 5 6 7

freq. lev. 1

1 2 3 4 5 6 7

freq. lev. 2

1 2 3 4 5 6 7

freq. lev. 3

1 2 3 4 5 6 7

freq. lev. 4

1 2 3 4 5 6 7

freq. lev. 5

1 2 3 4 5 6

freq. lev. 6

1 2 3 4 5 6

freq. lev. 7

4

freq. lev. 8

Test filtration ’yh52’

SLIDE 38

PERSISTENT PLOT FOR SYNTHETIC EXAMPLE

1 2 3 4 5 6 1 2 3 4 5 6 birth death

Persistence plot for the filtration ’yh52’ in dimension 1

SLIDE 39

CT GOES BEYOND NETWORK OR GRAPHICAL MODELS

◮ A simple network or graphical model connects pairs of nodes

by lines.

◮ A solid triangle connects three nodes. ◮ In real data examples, filtration might include “triangles”

(“simplices”) connecting 60 or more nodes.

◮ Not useful to try to interpret CT as a generalized network

method.

SLIDE 40

SECRET (?) OF CT’S SUCCESS

◮ CT finds interesting structure in data without getting

verwhelmed by a combinatorial explosion. How is it able to

do this?

◮ A homology class (hole) is a global phenomenon that requires

the “cooperation” of all the variables.

◮ This makes holes in the filtration very rare. ◮ “Very rare” compared, not to the “population” of all data

sets, but rare compared to the size of the combinatorial explosion.

◮ So data sets with holes appear to be rather common. ◮ But the number of holes one gets is manageable: Maybe a

dozen or so.

SLIDE 41

EXAMPLE: DMN IN DIMENSION 4

◮ fMRI data, default mode network (32 regions). ◮ Arno Klein and I looked at homology in dimension 4

(corresponds to 6th- or higher-order dependence).

◮ Combinatorial explosion: There are

906,192 ways to choose 6 regions out of 32.

◮ Median number of holes in dimension 4 is 2, max is 18. ◮ This represents a tremendous reduction in size compared to

size of the combinatorial explosion.

◮ (Turns out that presence of homology in dimension 4 – 6th-

and higher-order dependence – discriminates an ADHD group from controls.)

SLIDE 42

fMRI

◮ “fMRI” stands for “functional magnetic resonance imaging”.

◮ It images the functioning of the brain of a living person. ◮ Contrasted to a “structural MRI” which images (at higher

resolution) the anatomy of the brain.

◮ Active areas of the brain require more oxygen than do inactive

nes.

◮ This generates a “Blood-Oxygen-Level Dependence” (BOLD)

signal that an MR machine can detect.

◮ An fMRI image of the brain can be taken about once every 2

seconds.

◮ Spatial resolution is about 3 × 3 × 5 mm3. ◮ Presumption is that high BOLD values in a brain region

indicate that the region is active.

◮ So activity of different parts of the brain over time can be

recorded.

SLIDE 43

“This exciting technology has revolutionized the scientific study of the mind.”

SLIDE 44

“FUNCTIONAL CONNECTIVITY”

◮ Means coordination of activity in different brain regions. ◮ Abnormal functional connectivity is believed to be important

in Attention Deficit Hyperactivity Disorder (ADHD).

◮ True functional connectivity is probably reflected in observed

joint variation in BOLD among various brain regions.

◮ Ergo one can learn about functional connectivity by studying

the statistical dependence of BOLD among brain regions.

◮ Interplay of activity in two regions is like a telephone call. ◮ Mightn’t the brain make use of “conference calls” involving

more than two regions?

◮ Makes sense to study joint variation of groups of, not just 2,

but 3, 4, etc. regions in fMRI data.

SLIDE 45

DATA SET

◮ Publicly available fMRI data of NYU provenance. (Arno found

it.)

◮ Resting state. ◮ 25 ADHD subjects. ◮ 41 healthy controls. ◮ Once processed by Arno, data included for every subject BOLD

values in 92 brain regions at 192 time points.

SLIDE 46

WE APPLIED CT TO EACH SUBJECT’S fMRI DATA SEPARATELY.

◮ Looked at homology up to dimension 5.

◮ Pertains to dependence (connectivity) of orders of seven or

more.

◮ Like fitting a LS regression model with one or more sixth-order

interactions.

SLIDE 47

“TIME DOMAIN”

◮ Dichotomize BOLD in each region separately.

◮ At 80th percentile.

◮ Take dichotomized BOLD values in all regions in each time

point to be an “observation”.

◮ Time dependence is ignored in doing this.

SLIDE 48

FOURIER DOMAIN

◮ Brings temporal dependence back into the picture.

time BOLD

Excerpt from 'Left-Pallidum'

time BOLD

freq = 0.63

time BOLD

freq = 1.26

time BOLD

freq = 1.88

time BOLD

freq = 2.51

time BOLD

freq = 3.14

SLIDE 49

“PERIODOGRAM”

PERIODOGRAM

SLIDE 50

THIS OPERATION CAPTURES THE TEMPORAL STRUCTURE OF THE BOLD TIME SERIES.

SLIDE 51

SIMILAR PERIODOGRAMS SUGGEST FUNCTIONAL CONNECTIVITY

◮ Suppose the BOLD activity curves of regions A and B are the

same.

◮ EXCEPT they are shifted relative to each other in time.

◮ E.g., at all time points t, BOLD of region B at time t is the

same as that of A at time t − 2.

◮ Strong relationship would not be apparent in a time domain

analysis.

◮ In time domain we only look at simultaneous activity.

◮ But the periodograms of A and B would be exactly the same!

SLIDE 52

IN FOURIER DOMAIN DEFINE CONCURRENCES WITHIN EACH FREQUENCY.

◮ In each region classify each frequency as “active” or not

depending on whether periodogram for that region exceeds or not a given threshold at that frequency.

◮ Take dichotomized periodograms at the same Fourier

frequency to be an “observation”.

◮ Define concurrence in Fourier domain just as in time domain,

but instead of time points, use frequencies.

◮ So in Fourier domain, A and B will be in exactly the same

concurrences, reflecting their tight association.

SLIDE 53

AVERAGE PERSISTENCE PLOTS

SLIDE 54

Whole Brain, Time Domain

10 20 30 10 20 30 birth death 0.1 0.1 0.3 0.5 1

adhd in dim 0

5 10 15 5 10 15 birth death 0.1 0.2 0.6 1.3 2.5 4.6

adhd in dim 1

2 4 6 8 10 12 14 2 4 6 8 10 12 14 birth death 0.2 0.6 1.5 3.2 6.3 11.4

adhd in dim 2

10 20 30 10 20 30 birth death

control in dim 0

5 10 15 5 10 15 birth death

control in dim 1

2 4 6 8 10 12 14 2 4 6 8 10 12 14 birth death

control in dim 2

SLIDE 55

Whole Brain, Fourier Domain

2 4 6 8 10 2 4 6 8 10 birth death 0.1 0.1 0.2 0.4 0.6 0.8

adhd in dim 0

1 2 3 4 5 6 1 2 3 4 5 6 birth death 0.1 0.3 0.6 1.1 1.7 2.5 3.6

adhd in dim 1

1 2 3 4 1 2 3 4 birth death 0.1 0.1 0.3 0.5 0.8 1.2 1.7

adhd in dim 2

2 4 6 8 10 2 4 6 8 10 birth death

control in dim 0

1 2 3 4 5 6 1 2 3 4 5 6 birth death

control in dim 1

1 2 3 4 1 2 3 4 birth death

control in dim 2

SLIDE 56

Default Mode Network, Time Domain

10 20 30 10 20 30 birth death

0.04 0.12 0.28 0.55 0.95

adhd in dim 0

5 10 20 5 10 15 20 birth death

0.03 0.11 0.25 0.5 0.86

adhd in dim 1

5 10 15 5 10 15 birth death

0.16 0.54 1.28 2.49 4.31

adhd in dim 2

4 8 12 2 4 6 8 10 12 birth death

0.09 0.29 0.69 1.34 2.32

adhd in dim 3

4 8 12 2 4 6 8 10 12 birth death

0.08 0.28 0.66 1.29 2.22

adhd in dim 4

2 4 6 8 2 4 6 8 10 birth death

0.03 0.11 0.26 0.51 0.88

adhd in dim 5

10 20 30 10 20 30 birth death

control in dim 0

5 10 20 5 10 15 20 birth death

control in dim 1

5 10 15 5 10 15 birth death

control in dim 2

4 8 12 2 4 6 8 10 12 birth death

control in dim 3

4 8 12 2 4 6 8 10 12 birth death

control in dim 4

2 4 6 8 2 4 6 8 10 birth death

control in dim 5

SLIDE 57

Default Mode Network, Fourier Domain

2 4 6 8 2 4 6 8 10 birth death

0.04 0.12 0.28 0.55 0.95

adhd in dim 0

2 4 6 1 2 3 4 5 6 birth death

0.03 0.11 0.25 0.5 0.86

adhd in dim 1

1 2 3 4 5 1 2 3 4 5 birth death

0.16 0.54 1.28 2.49 4.31

adhd in dim 2

1 2 3 4 1 2 3 4 birth death

0.09 0.29 0.69 1.34 2.32

adhd in dim 3

0.5 0.5 1.5 2.5 0.5 0.0 0.5 1.0 1.5 2.0 2.5 birth death

0.08 0.28 0.66 1.29 2.22

adhd in dim 4

0.5 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 birth death

0.03 0.11 0.26 0.51 0.88

adhd in dim 5

2 4 6 8 2 4 6 8 10 birth death

control in dim 0

2 4 6 1 2 3 4 5 6 birth death

control in dim 1

1 2 3 4 5 1 2 3 4 5 birth death

control in dim 2

1 2 3 4 1 2 3 4 birth death

control in dim 3

0.5 0.5 1.5 2.5 0.5 0.0 0.5 1.0 1.5 2.0 2.5 birth death

control in dim 4

0.5 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 birth death

control in dim 5

SLIDE 58

SOME FINDINGS BASED ON PERSISTENCE PLOTS

◮ Differences between groups in whole brain, Fourier domain,

dimensions 1 and 2 (especially dimension 1).

◮ Differences between groups in DMN, time domain, dimensions

4 and 5 (especially dimension 4).

◮ 64.0% of ADHD subjects had any homology in time domain in

the DMN in dimension 4 compared to 92.6% of controls.

SLIDE 59

“LOCALIZATION”

◮ Homology classes (“holes”) involve all the variables. ◮ Holes can often be “localized” by identifying groups of

variables (“short cycles”) most closely associated with them.

◮ Short cycles can be examined to see if they’re interesting, i.e.,

if they’re “needles” in a combinatorial “haystack”.

SLIDE 60

1 2 3 4 5 6 7

freq. lev. 1

1 2 3 4 5 6 7

freq. lev. 2

1 2 3 4 5 6 7

freq. lev. 3

1 2 3 4 5 6 7

freq. lev. 4

1 2 3 4 5 6 7

freq. lev. 5

1 2 3 4 5 6

freq. lev. 6

1 2 3 4 5 6

freq. lev. 7

4

freq. lev. 8

Test filtration ’yh52’ Thu Oct 4 11:01:46 2012

SLIDE 61

CAVEAT

◮ I make no claim that with CT one can find all the “needles”

r even the most important ones.

◮ I only suggest that CT might find some of them, in itself an

important contribution.

◮ CT apparently provides a view of high order dependence

unavailable using any other method.

◮ Vice versa.

SLIDE 62

LOCALIZATION: EXAMPLE

◮ In fMRI data we found interesting short cycles in dimension 1,

time domain, DMN.

◮ 3rd-order. ◮ “Short cycle” consists of a triple of regions. ◮ Each subject has a few hundred short cycles.

◮ Combinatorial explosion: 9,880 different possible triplets of

regions.

SLIDE 63

PERSISTENCE PLOT FOR ONE SUBJECT IN DIMENSION 1, TIME DOMAIN, DMN

5 10 15 5 10 15 birth death

*

SLIDE 64

SPECIAL SHORT CYCLES

◮ One short cycle in dimension 1 is found in 13 subjects.

ctx-lh-parsorbitalis + ctx-lh-rostralanteriorcingulate + ctx-rh-medialorbitofrontal

◮ This is statistically significant. ◮ Null hypothesis: All short cycles are equally likely to appear in

a subject’s data.

◮ 16 short cycles associated with the same class distinguish

ADHD from controls.

SLIDE 65

PRODUCTS

◮ A persistence plot treats every persistent class (hole) in

isolation, whether between or within dimensions.

◮ In topology there are “products” that provide possible ways

that holes can be combined to produce other holes.

◮ If α and β are holes of dimensions p and q, resp., then what I

call the “join” of the two holes, if it exists, is a hole of dimension p + q + 1.

◮ Joining gives a relationship among up to three dimensions.

◮ The join of two holes may or may not be present in a shape. ◮ Presence or absence of joins might be another part of the

homological signature of a particular phenomenon, like disease group.

SLIDE 66

INDEPENDENCE

◮ Surprisingly, joining is connected with statistical

independence, a fundamental notion in data analysis.

◮ Suppose one has nonoverlapping groups of variables in the left

hand and right hand.

◮ Suppose there’s negative association among the left hand

variables that shows up as a hole α

◮ Similarly, the right hand variables produce a hole β. ◮ So there’s dependence within each hand, but suppose that the

two groups of variables are independent of each other so there’s independence between the two groups of variables.

◮ Then over many observations of these variables, the join of α

and β will emerge!

◮ This fact generalizes to any number of groups of variables.

SLIDE 67

SIMULATION RESULTS

◮ Two assumptions:

1. Each group produces its own homology.
2. The two groups are independent of each other.

◮ In preliminary simulation I found that if both assumptions

hold then one frequently observes joining.

◮ But if either assumption fails then one hardly ever observes

joining.

◮ I have preliminary findings of joining in real data.

SLIDE 68

SOFTWARE

◮ I wrote R code that implements CT. ◮ It’s free and documented. ◮ I didn’t set out to write such software. ◮ I began tinkering and eventually tinkered my way up to

free-standing CT code.

◮ Good exercise for understanding computational homology. ◮ Advantage is that, since I have the source code and

understand it, it’s not too hard for me to implement nonstandard functionality like short cycles and join detection.

◮ Disadvantage: I believe other people have written faster

programs for computing persistent homology.

◮ (But there are lots of ways of speeding up my code!)

SLIDE 69

COMPUTING HOMOLOGY CAN BE VERY INTENSIVE!

◮ Combinatorial explosion is a problem for the computation. ◮ Fortunately, topology ignores details. ◮ One tries to replace the filtration with a simpler one that