Concurrence Topology: A Tool for Describing High-Order Statistical - - PowerPoint PPT Presentation
Concurrence Topology: A Tool for Describing High-Order Statistical - - PowerPoint PPT Presentation
Concurrence Topology: A Tool for Describing High-Order Statistical Dependence in Data Steven P. Ellis (mostly joint work with Arno Klein) 6/9/14 Abstract Data analytic methods possessing the following three features are desirable: (1)
Abstract
Data analytic methods possessing the following three features are desirable: (1) The method describes ”high-order dependence” among variables. (2) It does so with few preconceptions. And (3) it can handle at least dozens, maybe hundreds of variables. However, if approached in a naive fashion, data analysis having these three features triggers a ”combinatorial explosion”: The output from the analysis can include thousands, maybe millions of numbers. Few methods exist possessing all three features yet which avoid the combinatorial explosion. Ellis has devised a data analytic method he calls ”Concurrence Topology (CT)” which does so.
Abstract, continued
CT takes an apparently radically new approach to solving solving this problem. It starts by translating data into a ”filtration”, a series of ”shapes”. The shapes in the series are called ”frames”. A filtration is like a building. The frames are like floors of the building. But while the floors of a building are two-dimensional, the frames of a filtration can have dimension much higher than two. A filtration can have holes that are like elevator shafts in a building. Such holes indicate relatively weak or negative association among the variables. CT uses computational algebraic topology to describe the pattern
- f holes. Normally, there are no more than a few dozen
holes, so CT avoids the combinatorial explosion. Often
- ne can identify small groups of variables that are closely
associated with a given hole. This process facilitates interpretation of the hole.
Abstract, continued
A limitation of CT is that, so far, it only works with binary data. But quantitative data can always be binarized. Ellis wrote software in R (available upon request) implementing CT. A paper, written by Arno Klein and Ellis, introducing CT and demonstrating it on fMRI data has been accepted by a topology journal.
◮ Free R code exists that implements the procedures described
in this talk.
◮ Reference: S.P. Ellis, A. Klein (2014) “Describing high-order
statistical dependence using ‘concurrence topology’, with application to functional MRI brain data,” Homology, Homotopy, and Applications, 16, 245–264.
CONCERNED WITH DATA ANALYSIS CHARACTERIZED BY THREE FEATURES
INGREDIENT 1: HIGH-ORDER DEPENDENCE
◮ A statistic that can be computed from a multivariate sample
by looking at only k variables at a time, but which cannot be
- btained by looking at fewer than k variables at a time
reflects “kth order dependence” among the variables.
◮ “High-order dependence” means dependence of order at least
3.
Examples:
◮ The list of means of 10 variables reflects “first order
dependence”.
◮ A correlation matrix of 10 variables reflects second order
dependence.
◮ A simple network model reflects second order dependence. ◮ Factor analysis reflects second order dependence.
Regression
◮ Least squares estimates of the coefficients in a regression
model Y = β0 + β1X1 + β2X2 + β3X3 +β4X4 + β5X5 + error reflects second order dependence.
◮ The least squares estimate of the interaction β12 in the
regression model Y = β0 + β1X1 + β2X2 +β12X1 : X2 + error reflects third order dependence.
◮ Interactions in regression models can be important. ◮ This suggests that looking at dependence of order higher than
2 might be useful in general.
Three data sets identical (statistically) up to 2nd order, but not at third order.
I II III x y z x y z x y z 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
SUMMARIZING
◮ Typically, there are very many ways variables can be
dependent.
◮ Any summary of dependence cannot capture every sort of
dependence.
◮ Correlation.
INGREDIENT 2: “AGNOSTIC” STATISTICS
◮ Typically, formulating a regression model involves choices.
◮ Which variable should be the response (dependent) variable? ◮ Which variables should be the predictors (independent
variables)?
◮ Which variables should be included in interactions?
◮ Ditto for path analysis ◮ If you have prior knowledge to guide you, then regression
modeling (or path analysis) is a powerful way to learn from data.
◮ A more data-driven approach is “agnostic analysis”:
◮ Treating all variables the same a priori.
◮ Example: Factor analysis is a second order agnostic analysis
method.
Group variables
◮ I’m mostly interested in “unsupervised” methods. ◮ But if there is a variable that specifies classes or groups that
the data come from, then one might not want to treat it like any old variable.
◮ Output of unsupervised methods can be used as part of input
to supervised methods.
◮ Give examples later.
INGREDIENT 3: “LARGE” NUMBER OF VARIABLES
◮ In this talk “large number” means “dozens”, maybe a
hundred or so.
“COMBINATORIAL EXPLOSION”
◮ The three features constitute an “explosive mixture”. ◮ Prima facie, agnostically describing kth-order dependence in a
data set means examining all combinations of k variables at a time.
◮ If there are many variables and k > 2, the number of
combinations can be huge.
◮ Sometimes the collection of all these combinations can be
regarded as a “haystack” in which we’re searching for “needles”.
“COMBINATORIAL EXPLOSION:” EXAMPLE
Analysis of seventh-order dependence among the regions of the brain “default mode network” in an fMRI data set.
◮ 32 variables. ◮ Naive agnostic seventh order analysis of 32 variables means
looking at 32
7
- = 3, 365, 856 combinations of 7 variables.
◮ E.g., ≥ 3, 365, 856 terms in a “log linear model”.
◮ Data contained only 6,144 numbers.
“COMBINATORIAL EXPLOSION,” continued
◮ Computing and interpreting many many combinations is very
challenging.
◮ With many combinations, looking at individual combinations
- f k variables is not helpful:
◮ Difficult to interpret a torrent of numbers. ◮ When there are many groups of k variables, behavior of
individual groups is unlikely to be reproducible.
◮ (Multiple comparisons)
SOME METHODS THAT AGNOSTICALLY CAPTURE HIGH ORDER DEPENDENCE IN MANY VARIABLES
“Unsupervised” methods:
◮ There seem to be few established unsupervised methods that
capture high order dependence.
◮ Independent Components Analysis ◮ Tensor based methods:
◮ “Parallel factor analysis” ◮ “Tucker 3” ◮ Only go up to third order dependence?
“Supervised” methods
◮ Many machine learning classification methods tap into high
- rder dependence.
Experimental methods.
CONCURRENCE TOPOLOGY (CT)
◮ Apparently new “unsupervised” method for high-order
agnostic analysis of dependence among dozens (hundreds?) of variables.
◮ CT is radically different from methods mentioned above. ◮ Since there are few methods there’s no need to choose among
them: “Use all of them.”
◮ So comparing methods to see which is best is not urgent.
◮ The germ of the idea for CT came from a theoretical
neuroscience talk I heard by the mathematician Carina Curto.
CONCURRENCE TOPOLOGY (CT), continued
◮ CT is often able to extract a moderate number of high order
statistics from a combinatorial explosion.
◮ CT detects certain forms of negative or weak association
among the variables.
TOPOLOGY
TOPOL
TOPOLOGY, continued
◮ Topology is study of qualitative aspects of shapes. ◮ Quantitative aspects of shapes such as length, angle, area,
volume, curvature are only loosely connected to topology.
◮ Famously, topology can’t tell the difference between a donut
and a coffee cup.
◮ Topology does pay attenton to holes in shapes (like hole in
donut or in the handle of coffee cup).
◮ Topology ignores details.
◮ That’s good: There’s a combinatorial explosion of details. We
have to ignore practically all of them.
◮ That’s bad: Sometimes the details are important. ◮ “Needles” are details. ◮ But often we can recover details from a CT analysis.
ANALOGY FOR CT
◮ Consider this hypothetical histogram.
2 4 6 8 10 12 14 A B C
Persistence in a histogram
ANALOGY, continued
◮ Y axis is “count” or “frequency”.
◮ It’s discrete: 1, 2, 3, . . . .
◮ Cut the histogram at various heights.
◮ Suffices to do it at whole number heights ◮ “Frequency levels” ◮ Dark line segments show intersections of horizontal lines with
the histogram.
◮ As horizontal line moves downward, sometimes a gap appears
in the intersection.
◮ A gap is “born”.
◮ At a lower level the gap might be filled in.
◮ The gap “dies”.
◮ Difference in 2 heights is the “lifespan” of the gap.
PERSISTENCE
◮ The phenomenon of birth and death of gaps is “persistence”. ◮ Can plot it (“persistence plot”)
2 4 6 8 10 12 2 4 6 8 10 12 death A B C
Persistence plot in dimension 0 for histogram
CONCURRENCE TOPOLOGY RUDIMENTS
◮ In CT a group of dichotomous, i.e. “0-1”, variables is
represented as a series of shapes (“filtration”).
◮ A filtration can be thought of as a “building”. ◮ Various shapes in the filtration are like “floors” in the
building.
◮ But the floors of a building are 2-D, while the shapes in a
filtration can be high-D.
◮ The building is made out of “bricks”(simplices), one brick per
“observation”.
◮ time point in fMRI data ◮ subject in psychological scale data
◮ Each observation (subject or time point) contributes one
“brick”.
◮ A “brick” represents a “concurrence”: The variables that are
“1” in the observation.
50 100 150 10 20 30 40 50 60 70 concurrence ROI
Concurrence plot for sub84371
HOLES
◮ Holes in the filtration indicate relative weakness or negativity
in joint distribution of variables.
◮ Holes in filtration are like “stairwells” or “atriums” in the
building.
◮ They can span several “floors”.
◮ The number of “floors” spanned by a hole is the “lifespan” or
“persistence” of the hole.
◮ In concurrence topology one learns about data by studying the
pattern of holes in the data’s filtration.
CANNOT ANALYZE HOLES BY VISUAL INSPECTION
◮ Dimension of filtration is usually too high. ◮ Use computational topology to study pattern of holes.
◮ Branch of topology that studies holes in shapes is “homology
theory”.
◮ Technical word for “hole” is “homology class”.
DIMENSION OF HOLES
◮ Homology classes come in different “dimensions” : 0, 1, 2, . . . ◮ Examples:
◮ Gap between Earth and Moon. ◮ Bagel ◮ Basketball ◮ Higher dimensional holes. ◮ Later we’ll look at homology in fMRI data in dimensions 0
through 5.
“PERSISTENT HOMOLOGY”
◮ Finding “stairwells” and their spans means computing the
“persistent homology” of the filtration (“building”).
◮ Have done this for up to 74 variables (fMRI data). ◮ May be possible for a few hundred, but not thousands, of
variables.
◮ A hole of dimension d has to do with statistical dependence of
- rder at least d + 2.
SYNTHETIC EXAMPLE
◮ Test pattern for software.
1 2 3 4 5 6 7
- freq. lev. 1
1 2 3 4 5 6 7
- freq. lev. 2
1 2 3 4 5 6 7
- freq. lev. 3
1 2 3 4 5 6 7
- freq. lev. 4
1 2 3 4 5 6 7
- freq. lev. 5
1 2 3 4 5 6
- freq. lev. 6
1 2 3 4 5 6
- freq. lev. 7
4
- freq. lev. 8
Test filtration ’yh52’
PERSISTENT PLOT FOR SYNTHETIC EXAMPLE
1 2 3 4 5 6 1 2 3 4 5 6 birth death
Persistence plot for the filtration ’yh52’ in dimension 1
CT GOES BEYOND NETWORK OR GRAPHICAL MODELS
◮ A simple network or graphical model connects pairs of nodes
by lines.
◮ A solid triangle connects three nodes. ◮ In real data examples, filtration might include “triangles”
(“simplices”) connecting 60 or more nodes.
◮ Not useful to try to interpret CT as a generalized network
method.
SECRET (?) OF CT’S SUCCESS
◮ CT finds interesting structure in data without getting
- verwhelmed by a combinatorial explosion. How is it able to
do this?
◮ A homology class (hole) is a global phenomenon that requires
the “cooperation” of all the variables.
◮ This makes holes in the filtration very rare. ◮ “Very rare” compared, not to the “population” of all data
sets, but rare compared to the size of the combinatorial explosion.
◮ So data sets with holes appear to be rather common. ◮ But the number of holes one gets is manageable: Maybe a
dozen or so.
EXAMPLE: DMN IN DIMENSION 4
◮ fMRI data, default mode network (32 regions). ◮ Arno Klein and I looked at homology in dimension 4
(corresponds to 6th- or higher-order dependence).
◮ Combinatorial explosion: There are
906,192 ways to choose 6 regions out of 32.
◮ Median number of holes in dimension 4 is 2, max is 18. ◮ This represents a tremendous reduction in size compared to
size of the combinatorial explosion.
◮ (Turns out that presence of homology in dimension 4 – 6th-
and higher-order dependence – discriminates an ADHD group from controls.)
fMRI
◮ “fMRI” stands for “functional magnetic resonance imaging”.
◮ It images the functioning of the brain of a living person. ◮ Contrasted to a “structural MRI” which images (at higher
resolution) the anatomy of the brain.
◮ Active areas of the brain require more oxygen than do inactive
- nes.
◮ This generates a “Blood-Oxygen-Level Dependence” (BOLD)
signal that an MR machine can detect.
◮ An fMRI image of the brain can be taken about once every 2
seconds.
◮ Spatial resolution is about 3 × 3 × 5 mm3. ◮ Presumption is that high BOLD values in a brain region
indicate that the region is active.
◮ So activity of different parts of the brain over time can be
recorded.
“This exciting technology has revolutionized the scientific study of the mind.”
“FUNCTIONAL CONNECTIVITY”
◮ Means coordination of activity in different brain regions. ◮ Abnormal functional connectivity is believed to be important
in Attention Deficit Hyperactivity Disorder (ADHD).
◮ True functional connectivity is probably reflected in observed
joint variation in BOLD among various brain regions.
◮ Ergo one can learn about functional connectivity by studying
the statistical dependence of BOLD among brain regions.
◮ Interplay of activity in two regions is like a telephone call. ◮ Mightn’t the brain make use of “conference calls” involving
more than two regions?
◮ Makes sense to study joint variation of groups of, not just 2,
but 3, 4, etc. regions in fMRI data.
DATA SET
◮ Publicly available fMRI data of NYU provenance. (Arno found
it.)
◮ Resting state. ◮ 25 ADHD subjects. ◮ 41 healthy controls. ◮ Once processed by Arno, data included for every subject BOLD
values in 92 brain regions at 192 time points.
WE APPLIED CT TO EACH SUBJECT’S fMRI DATA SEPARATELY.
◮ Looked at homology up to dimension 5.
◮ Pertains to dependence (connectivity) of orders of seven or
more.
◮ Like fitting a LS regression model with one or more sixth-order
interactions.
“TIME DOMAIN”
◮ Dichotomize BOLD in each region separately.
◮ At 80th percentile.
◮ Take dichotomized BOLD values in all regions in each time
point to be an “observation”.
◮ Time dependence is ignored in doing this.
FOURIER DOMAIN
◮ Brings temporal dependence back into the picture.
time BOLD
Excerpt from 'Left-Pallidum'
time BOLD
freq = 0.63
time BOLD
freq = 1.26
time BOLD
freq = 1.88
time BOLD
freq = 2.51
time BOLD
freq = 3.14
“PERIODOGRAM”
PERIODOGRAM
THIS OPERATION CAPTURES THE TEMPORAL STRUCTURE OF THE BOLD TIME SERIES.
SIMILAR PERIODOGRAMS SUGGEST FUNCTIONAL CONNECTIVITY
◮ Suppose the BOLD activity curves of regions A and B are the
same.
◮ EXCEPT they are shifted relative to each other in time.
◮ E.g., at all time points t, BOLD of region B at time t is the
same as that of A at time t − 2.
◮ Strong relationship would not be apparent in a time domain
analysis.
◮ In time domain we only look at simultaneous activity.
◮ But the periodograms of A and B would be exactly the same!
IN FOURIER DOMAIN DEFINE CONCURRENCES WITHIN EACH FREQUENCY.
◮ In each region classify each frequency as “active” or not
depending on whether periodogram for that region exceeds or not a given threshold at that frequency.
◮ Take dichotomized periodograms at the same Fourier
frequency to be an “observation”.
◮ Define concurrence in Fourier domain just as in time domain,
but instead of time points, use frequencies.
◮ So in Fourier domain, A and B will be in exactly the same
concurrences, reflecting their tight association.
AVERAGE PERSISTENCE PLOTS
Whole Brain, Time Domain
10 20 30 10 20 30 birth death 0.1 0.1 0.3 0.5 1
adhd in dim 0
5 10 15 5 10 15 birth death 0.1 0.2 0.6 1.3 2.5 4.6
adhd in dim 1
2 4 6 8 10 12 14 2 4 6 8 10 12 14 birth death 0.2 0.6 1.5 3.2 6.3 11.4
adhd in dim 2
10 20 30 10 20 30 birth death
control in dim 0
5 10 15 5 10 15 birth death
control in dim 1
2 4 6 8 10 12 14 2 4 6 8 10 12 14 birth death
control in dim 2
Whole Brain, Fourier Domain
2 4 6 8 10 2 4 6 8 10 birth death 0.1 0.1 0.2 0.4 0.6 0.8
adhd in dim 0
1 2 3 4 5 6 1 2 3 4 5 6 birth death 0.1 0.3 0.6 1.1 1.7 2.5 3.6
adhd in dim 1
1 2 3 4 1 2 3 4 birth death 0.1 0.1 0.3 0.5 0.8 1.2 1.7
adhd in dim 2
2 4 6 8 10 2 4 6 8 10 birth death
control in dim 0
1 2 3 4 5 6 1 2 3 4 5 6 birth death
control in dim 1
1 2 3 4 1 2 3 4 birth death
control in dim 2
Default Mode Network, Time Domain
10 20 30 10 20 30 birth death
0.04 0.12 0.28 0.55 0.95
adhd in dim 0
5 10 20 5 10 15 20 birth death
0.03 0.11 0.25 0.5 0.86
adhd in dim 1
5 10 15 5 10 15 birth death
0.16 0.54 1.28 2.49 4.31
adhd in dim 2
4 8 12 2 4 6 8 10 12 birth death
0.09 0.29 0.69 1.34 2.32
adhd in dim 3
4 8 12 2 4 6 8 10 12 birth death
0.08 0.28 0.66 1.29 2.22
adhd in dim 4
2 4 6 8 2 4 6 8 10 birth death
0.03 0.11 0.26 0.51 0.88
adhd in dim 5
10 20 30 10 20 30 birth death
control in dim 0
5 10 20 5 10 15 20 birth death
control in dim 1
5 10 15 5 10 15 birth death
control in dim 2
4 8 12 2 4 6 8 10 12 birth death
control in dim 3
4 8 12 2 4 6 8 10 12 birth death
control in dim 4
2 4 6 8 2 4 6 8 10 birth death
control in dim 5
Default Mode Network, Fourier Domain
2 4 6 8 2 4 6 8 10 birth death
0.04 0.12 0.28 0.55 0.95
adhd in dim 0
2 4 6 1 2 3 4 5 6 birth death
0.03 0.11 0.25 0.5 0.86
adhd in dim 1
1 2 3 4 5 1 2 3 4 5 birth death
0.16 0.54 1.28 2.49 4.31
adhd in dim 2
1 2 3 4 1 2 3 4 birth death
0.09 0.29 0.69 1.34 2.32
adhd in dim 3
0.5 0.5 1.5 2.5 0.5 0.0 0.5 1.0 1.5 2.0 2.5 birth death
0.08 0.28 0.66 1.29 2.22
adhd in dim 4
0.5 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 birth death
0.03 0.11 0.26 0.51 0.88
adhd in dim 5
2 4 6 8 2 4 6 8 10 birth death
control in dim 0
2 4 6 1 2 3 4 5 6 birth death
control in dim 1
1 2 3 4 5 1 2 3 4 5 birth death
control in dim 2
1 2 3 4 1 2 3 4 birth death
control in dim 3
0.5 0.5 1.5 2.5 0.5 0.0 0.5 1.0 1.5 2.0 2.5 birth death
control in dim 4
0.5 0.5 1.0 1.5 0.5 0.0 0.5 1.0 1.5 birth death
control in dim 5
SOME FINDINGS BASED ON PERSISTENCE PLOTS
◮ Differences between groups in whole brain, Fourier domain,
dimensions 1 and 2 (especially dimension 1).
◮ Differences between groups in DMN, time domain, dimensions
4 and 5 (especially dimension 4).
◮ 64.0% of ADHD subjects had any homology in time domain in
the DMN in dimension 4 compared to 92.6% of controls.
“LOCALIZATION”
◮ Homology classes (“holes”) involve all the variables. ◮ Holes can often be “localized” by identifying groups of
variables (“short cycles”) most closely associated with them.
◮ Short cycles can be examined to see if they’re interesting, i.e.,
if they’re “needles” in a combinatorial “haystack”.
1 2 3 4 5 6 7
- freq. lev. 1
1 2 3 4 5 6 7
- freq. lev. 2
1 2 3 4 5 6 7
- freq. lev. 3
1 2 3 4 5 6 7
- freq. lev. 4
1 2 3 4 5 6 7
- freq. lev. 5
1 2 3 4 5 6
- freq. lev. 6
1 2 3 4 5 6
- freq. lev. 7
4
- freq. lev. 8
Test filtration ’yh52’ Thu Oct 4 11:01:46 2012
CAVEAT
◮ I make no claim that with CT one can find all the “needles”
- r even the most important ones.
◮ I only suggest that CT might find some of them, in itself an
important contribution.
◮ CT apparently provides a view of high order dependence
unavailable using any other method.
◮ Vice versa.
LOCALIZATION: EXAMPLE
◮ In fMRI data we found interesting short cycles in dimension 1,
time domain, DMN.
◮ 3rd-order. ◮ “Short cycle” consists of a triple of regions. ◮ Each subject has a few hundred short cycles.
◮ Combinatorial explosion: 9,880 different possible triplets of
regions.
PERSISTENCE PLOT FOR ONE SUBJECT IN DIMENSION 1, TIME DOMAIN, DMN
5 10 15 5 10 15 birth death
*
SPECIAL SHORT CYCLES
◮ One short cycle in dimension 1 is found in 13 subjects.
ctx-lh-parsorbitalis + ctx-lh-rostralanteriorcingulate + ctx-rh-medialorbitofrontal
◮ This is statistically significant. ◮ Null hypothesis: All short cycles are equally likely to appear in
a subject’s data.
◮ 16 short cycles associated with the same class distinguish
ADHD from controls.
PRODUCTS
◮ A persistence plot treats every persistent class (hole) in
isolation, whether between or within dimensions.
◮ In topology there are “products” that provide possible ways
that holes can be combined to produce other holes.
◮ If α and β are holes of dimensions p and q, resp., then what I
call the “join” of the two holes, if it exists, is a hole of dimension p + q + 1.
◮ Joining gives a relationship among up to three dimensions.
◮ The join of two holes may or may not be present in a shape. ◮ Presence or absence of joins might be another part of the
homological signature of a particular phenomenon, like disease group.
INDEPENDENCE
◮ Surprisingly, joining is connected with statistical
independence, a fundamental notion in data analysis.
◮ Suppose one has nonoverlapping groups of variables in the left
hand and right hand.
◮ Suppose there’s negative association among the left hand
variables that shows up as a hole α
◮ Similarly, the right hand variables produce a hole β. ◮ So there’s dependence within each hand, but suppose that the
two groups of variables are independent of each other so there’s independence between the two groups of variables.
◮ Then over many observations of these variables, the join of α
and β will emerge!
◮ This fact generalizes to any number of groups of variables.
SIMULATION RESULTS
◮ Two assumptions:
- 1. Each group produces its own homology.
- 2. The two groups are independent of each other.