Karcher means of positive definite matrices Yongdo Lim Sungkyunkwan - - PowerPoint PPT Presentation

karcher means of positive definite matrices
SMART_READER_LITE
LIVE PREVIEW

Karcher means of positive definite matrices Yongdo Lim Sungkyunkwan - - PowerPoint PPT Presentation

Karcher means of positive definite matrices Yongdo Lim Sungkyunkwan University January 14, 2014 Overview There has recently been considerable interest in defining a mean (averaging, barycenter, centroid) on manifolds/ metric spaces. A


slide-1
SLIDE 1

Karcher means of positive definite matrices

Yongdo Lim Sungkyunkwan University January 14, 2014

slide-2
SLIDE 2

Overview

There has recently been considerable interest in defining a “mean” (averaging, barycenter, centroid) on manifolds/ metric spaces. A natural and attractive candidate of averaging procedures is the “least squares mean”. This mean has appeared under a variety of

  • ther designations: center of gravity, Frechet mean, Cartan mean,

Riemannian center of mass, Riemannian geometric mean, or frequently, Karcher mean, the terminology we adopt, and plays a central role in image processing (subdivision scheme), medical imaging(MRI), radar system, statistical biology (DNA/genome), to cite a few. Our purposes in this talk include the following: (1) Monotonicity conjecture (2) Deterministic approximations to the Karcher mean. (3) Strong Law of Large Number and no dice theorem.

slide-3
SLIDE 3

The Geometric Mean

√ ab

The subject of (binary) means for positive numbers or line segments has a rich mathematical lineage dating back into

  • antiquity. The Greeks, motivated by their interest in proportions

and musical ratios, defined at least eleven different means (depending on how one counts), the arithmetic, geometric, harmonic, and golden being the best known. A geometric construction for the geometric mean

√ ab of a, b > 0

is given by Euclid in Book II in the form of “squaring the rectangle,” i.e., constructing a square of the same area as a given rectangle of sides a and b. The study of various means and their properties on the positive reals has remained an active area of investigation up to the present day (e.g., SIAM Review 1970-Gauss and Carlson’s iterative means and elliptic integrals.)

slide-4
SLIDE 4

Positive Definite Matrices

Positive definite matrices have become fundamental computational

  • bjects in many areas of engineering, computer science, physics,

statistics, and applied mathematics. They appear in a diverse variety of settings: covariance matrices in statistics, elements of the search space in convex and semidefinite programming, kernels in machine learning, density matrices in quantum information, data points in radar imaging, and diffusion tensors in medical imaging, to cite only a few. A variety of computational algorithms have arisen for approximations, interpolation, filtering, estimation, and averaging.

  • A Hermitian matrix A is positive (semi)definite if all eigenvalues
  • f A are (nonnegative) positive. The set of all k × k positive

definite matrices is an open convex cone of the Euclidean space of Hermitian matrices equipped with X|Y = Tr(XY).

slide-5
SLIDE 5

The Riemannian Trace Metric

In recent years, it has been increasingly recognized that the Euclidean distance is often not the most suitable for the space

P = Pk of positive definite matrices and that working with the

appropriate geometry does matter in computational problems. It is thus not surprising that there has been increasing interest in the trace metric δ, the distance metric arising from the natural Riemannian structure on P making it a Riemannian manifold, indeed, a symmetric space of negative curvature:

δ(A, B) = || log A−1/2BA−1/2||2 = k

  • i=1

log2 λi(A−1B)

1

2

, where λi(X) denotes the i-th eigenvalue of X in non-decreasing

  • rder. For positive reals (k = 1), δ(a, b) = | log a − log b|.
slide-6
SLIDE 6

Basic Geometric Properties

We recall some basic properties of P endowed with the trace metric [ S. Lang 1999 or Lawson and L., 2000 Amer. Math. Monthly article]. (1) The matrix geometric mean

A#B = A1/2(A−1/2BA−1/2)1/2A1/2 is the unique metric

midpoint between A and B. (2) There is a unique metric geodesic line through any two distinct points A, B ∈ P given by the weighted means

γ(t) = A#tB = A1/2(A−1/2BA−1/2)tA1/2.

(3) (Congruence Invariance) Congruence transformations

A → CAC∗ for C invertible are isometries of P.

(4) Inversion A → A−1 is an isometry. (5) (Monotonicity; L¨

  • wner-Heinz inequality)

A B, C D ⇒ A#tB C#tD.

Here, A B ⇐

⇒ B − A is positive semidefinite.

slide-7
SLIDE 7

A Big Question

The big question on averaging of positive definite matrices is that Given n positive definite matrices, what is the best way to average them (i.e., find their mean) in such a way that the answer is again positive definite? Once one realizes that the matrix geometric mean

Λ2(A, B) = A#B := A1/2(A−1/2BA−1/2)1/2A1/2

is the metric midpoint of A and B for the trace metric δ, it is natural to use an averaging technique over this metric to extend this mean to n-variables. First M. Moakher (2005) and then Bhatia and Holbrook (2006) suggested the least squares mean, taking the mean to be the unique minimizer of the sum of the squares of the distances:

Λn(A1, . . . , An) = arg min

X∈P n

  • i=1

δ2(X, Ai).

slide-8
SLIDE 8

Some Background

This idea had been anticipated by ´ Elie Cartan, who showed among

  • ther things such a unique minimizer exists if the points all lie in a

convex ball in a Riemannian manifold, which is enough to deduce the existence of the least squares mean globally for P. The mean is frequently called the Karcher mean in light of its appearance in his work on Riemannian manifolds (1977). Indeed, he considered general probabilistic means that included weighted least squares mean:

Λn(w; A1, . . . , An) = arg min

X∈ P n

  • i=1

wiδ2(X, Ai),

where w = (w1, . . . , wn) is a probability vector.

slide-9
SLIDE 9

Monotonicity Conjecture

In a 2004 LAA article called “Geometric Means” T. Ando, C.K. Li and R. Mathias gave a construction that extended the two-variable matrix geometric mean to n-variables for each n 3 and identified a list of ten properties (ALM axioms) that this extended mean

  • satisfied. Both contributions–the construction and the axiomatic

properties–were important and have been influential in subsequent developments. Question: Do the Ando-Li-Mathias properties extend to the least squares mean? In particular, Bhatia and Holbrook (2006) asked whether the least squares mean was monotonic in each of its arguments (Multivariable L¨

  • wner-Heinz inequality). Computer

calculations indicated “Yes.”

(Monotonicity) Λn(A1, . . . , An) Λn(B1, . . . , Bn)

if Ai Bi, ∀i.

slide-10
SLIDE 10

ALM Axioms

A geometric mean of n positive definite matrices is a function

G : Pn → P satisfying

(P1) G(A1, . . . , An) = (A1 · · · An)

1 n for commuting Ai’s.

(P2) G(a1A1, . . . , anAn) = (a1 · · · an)

1 n G(A1, . . . , An).

(P3) G(Aσ(1), . . . , Aσ(n)) = G(A1, . . . , An), ∀σ. (P4) G(A1, . . . , An) G(B1, . . . , Bn) for Ai Bi, ∀i. (P5) G is continuous. (P6) G(M∗A1M, . . . , M∗AnM) = M∗G(A1, . . . , An)M. (P7) G is jointly concave. (P8) G(A−1

1 , . . . , A−1 n )−1 = G(A1, . . . , An).

(P9) DetG(A1, . . . , An) = (n

i=1 DetAi)

1 n .

(P10) ( 1

n

n

i=1 A−1 i )−1 G(A1, . . . , An) 1 n

n

i=1 Ai.

  • The ten properties are known as Ando, Li, Mathias axioms for

multivariable geometric means of positive definite matrices.

slide-11
SLIDE 11

NPC Spaces

The answer of the monotonicity conjecture is indeed “yes,” but showing it required new tools: the theory of nonpositively curved metric spaces, techniques from probability and random variable theory, and the fairly recent combination of the two, particularly by K.-T. Sturm (2003). The setting appropriate for our considerations is that of globally nonpositively curved metric spaces, or NPC spaces for short: These are complete metric spaces M satisfying for each x, y ∈ M, there exists m ∈ M such that for all z ∈ M

d2(m, z) 1

2d2(x, z) + 1 2d2(y, z) − 1 4d2(x, y).

(NPC)

Such spaces are also called (global) CAT(0)-spaces, Hadamard or Bruhat-Tits spaces (e.g., Hilbert spaces, symmetric cones of finite rank, Phylogenetic Trees, Booklets, products, Gromov-Hausdorff limits)

slide-12
SLIDE 12

Metric Geodesics

The theory of such NPC spaces is quite extensive. In particular the

m appearing in d2(m, z) 1

2d2(x, z) + 1 2d2(y, z) − 1 4d2(x, y)

(NPC)

is the unique metric midpoint between x and y. By inductively choosing midpoints for dyadic rationals and extending by continuity, one obtains for each x = y a unique metric minimal geodesic γ : [0, 1] → M satisfying

d(γ(t), γ(s)) = |t − s|d(x, y), γ(0) = x, γ(1) = y.

  • Any (some) classical problems based on Hilbert spaces arises in

NPC spaces; convex and stochastic analysis, probabilistic measure theory, optimal transport, optimization, metric geometry, averaging (e.g., Fermat-Weber problem).

slide-13
SLIDE 13

Weighted Means in NPC-Spaces

For the minimal (metric) geodesic γ : [0, 1] → M with γ(0) = x and

γ(1) = y, we denote γ(t) by x#ty and call it the t-weighted mean

  • f x and y. The midpoint x#1/2y we denote simply as x#y. We

remark that by uniqueness x#ty = y#1−tx; in particular,

x#y = y#x.

We note that x#ty = (1 − t)x + ty for x, y ∈ Rn, and thus x#ty can be thought of as a generalization of the latter. In P the minimal geodesic from A to B for the trace metric extends to a geodesic line γ : R → P, and for each t, γ(t) = A#tB, the

t-weighted geometric mean.

slide-14
SLIDE 14

The Semiparallelogram Law

Weakening the parallelogram law in Hilbert space to an inequality yields (NPC) or the semiparallelogram law: sum of 2 diagonals squared

  • sum of 4 sides squared

d2(x1, x2) + 4d2(x, m)(= (2d(x, m))2)

  • 2d2(x, x1) + 2d2(x, x2)

✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟✟ ✟ x x2 x1 m

slide-15
SLIDE 15

Metrics and Curvature

Equation (NPC), the semiparallelogram law, holds in both euclidean and hyperbolic geometry. More generally, it is satisfied by the length metric in any simply connected nonpositively curved Riemannian manifold. Hence the metric definition represents a metric generalization of nonpositive curvature. Fact: The trace metric on the nonpositively curved Riemannian symmetric space of positive definite matrices P is a particular and important example of an NPC space. Equation (NPC) admits a more general formulation in terms of the weighted mean. For all 0 t 1 we have

d2(x#ty, z) (1 − t)d2(x, z) + td2(y, z) − t(1 − t)d2(x, y).

slide-16
SLIDE 16

Metric Least Squares Means

The weighted least squares mean can be easily formulated in any metric space (M, d). Given (a1, . . . , an) ∈ Mn, and positive real numbers w1, . . . , wn summing to 1, we define

Λn(w1, . . . , wn; a1, . . . , an) := arg min

z∈M n

  • i=1

wid2(z, ai),

(1) provided the minimizer exists and is unique. In general the minimizer may fail to exist or fail to be unique, but existence and uniqueness always holds for NPC spaces as can be readily deduced from the uniform convexity of the metric.

slide-17
SLIDE 17

The Inductive (Sagae-Tanabe) Mean

One other mean will play an important role in what follows, one that we shall call the inductive mean, following the terminology of K.-T. Sturm (2003). It is defined inductively for NPC spaces (or more generally for metric spaces with weighted binary means

x#ty) for each k 2 by S2(x, y) = x#y and for k 3, Sk(x1, . . . , xk) = Sk−1(x1, . . . , xk−1)# 1

k xk.

In euclidean space it collapses to Sn(x1, . . . , xn) = (1/n) n

i=1 xi.

Note that this mean at each stage is defined from the previous stage by taking the appropriate two-variable weighted mean, which is monotone in each variable for P. Thus the inductive mean is monotone.

slide-18
SLIDE 18

Random Walks and Sturm’s Theorem

Let (M, d) be an NPC metric space, {x1, . . . , xm} ⊆ M, and w = (w1, . . . , wm) be a probability vector. Set Nm = {1, 2, . . . , m} and assign to k ∈ Nm the probability wk. For each ω ∈ ∞

n=1 Nm, define inductively a sequence σ = σω in

M by (rolling dice) σ(1) = xω(1), σ(k) = Sk(xω(1), . . . , xω(k)),

where Sk is the inductive mean. (The sequence σω may be viewed as a “walk” starting at σ(1) = xω(1) and obtaining σ(k) by moving from σ(k − 1) toward xω(k) a distance of (1/k)d(σ(k − 1), xω(k)).) Sturm’s Theorem. Giving ∞

n=1 Nm the product probability,

  • ω ∈

  • n=1

Nm : lim

n→∞ σω(n) = Λ(w; x1, . . . , xm)

  • has measure 1, i.e., σω(n) → Λ(ω; x1, . . . , xm) for almost all ω.
slide-19
SLIDE 19

Random Walks and Sturm’s Theorem

More generally, Sturm establishes a version of the Strong Law of Large Numbers for random variables into an NPC metric space, with limit the least squares mean. Using Sturm’s Theorem, Lawson and L.(Math. Ann., 2011) were able to show: (1) The least squares mean Λ on P, the limit a.e. of the inductive mean, is monotone: Ai Bi for 1 i n implies

Λ(A1, . . . , An) Λ(B1, . . . Bn).

(2) All ten of the Ando-Li-Mathias (ALM) axioms hold for Λ. (3) In a natural way Λ can be extended to a weighted mean, and appropriate weighted versions of the ten properties hold. Note: The ALM mean is typically distinct from the least squares mean for n 3. Thus the ALM axioms do not characterize a

  • mean. The latter fact had already been noted by Bini, Meini and

Poloni (2010), who introduced a much more computationally efficient variant of the ALM mean.

slide-20
SLIDE 20

No Dice Conjecture: un-weighted case

Sturm’s Theorem lim

n→∞ σω(n) = Λ(x1, . . . , xm) a.e. ω ∈ Nm,

implies that there infinitely many random works converging to the Karcher mean, but unfortunately does not provide an “explicit deterministic sequence”. Computer calculations indicated the “natural and periodic” random walk converges to the Karcher mean:

ω(n) = n

where n denotes the residual of n( mod m).

slide-21
SLIDE 21

Example: n = 3

Sn(x1, . . . , xn) = Sn−1(x1, . . . , xn−1)# 1

n xn;

  • The periodic random walk for n = 3;

S1 = x1, S2 = x1#x2, S3 = (x1#x2)# 1

3 x3,

S4 = ((x1#x2)# 1

3 x3)# 1 4 x1,

S5 = [((x1#x2)# 1

3 x3)# 1 4 x1]# 1 5 x2

S6 =

  • [((x1#x2)# 1

3 x3)# 1 4 x1]# 1 5 x2

  • # 1

6 x6...

In euclidean space it collapses to

x1, x1 + x2

2 , x1 + x2 + x3 3 , 2x1 + x2 + x3 4 , 2x1 + 2x2 + x3 5 , x1 + x2 + x3 3

slide-22
SLIDE 22

Follow-ups

Bhatia and Karandikar (2012) were able to give a simplified probabilistic proof of a weaker law of large numbers for the case P

  • f the positive matrices that sufficed to derive monotonicity and
  • ther properties of the multivariable geometric mean.

Holbrook (2012) solved the no dice conjecture, but his proof is quite involved and heavily relies on matrix analysis and the differentiable structure of P and cannot be extended to NPC spaces in a straightforward way. However he and also other researchers conjectured that his “no dice” result should extend to arbitrary NPC spaces. In some very recent work M. Baˇ c´ ak has given a proof of the no dice theorem for locally compact NPC-spaces, and M. Palfia and L. settled the no dice theorem for more general NPC spaces and for the weighted case by introducing weighted random walks.

slide-23
SLIDE 23

Weighted Random Walks

Theorem (Palfia and L.)

Let w = (w1, . . . , wm), a = (a1, . . . , am) ∈ Mm. Set

S1 = a1, S2 = a1#

w2 w1+w2 a2,

· · · , Sk+1 = Sk#sk+1ak+1,

where sk =

wk l(k) with l(k) = k i=1 wi. Then limk→∞ Sk = Λ(w; a).

  • [Variance inequality or multivariable semiparallelogram law]

d2(x, Λ(w; a))

m

  • i=1

wi

  • d2(x, ai) − d2(Λ(w; a), ai)

.

  • General weighted random walks Sω

k varying over ω ∈ ∞ n=1 Nm

  • exist. Does Sω

k → Λ(w; a) almost surely?, a weighted version of

S.L.L.N?

slide-24
SLIDE 24

The Karcher Equation

The Karcher mean

Λn(w; A1, . . . , An) = arg min

X∈ P n

  • i=1

wiδ2(X, Ai) (KM)

coincides with the unique positive definite solution of the Karcher equation

n

  • i=1

wi log(X− 1

2 AiX− 1 2 ) = 0.

(KE)

(The Karcher mean is thus characterized by the vanishing of the gradient, which is equivalent to its being a solution of the Karcher equation.) Various numerical methods for the Karcher mean or Karcher equation of positive definite matrices have been introduced in the literature: optimization algorithms like Newton’s method or a gradient descent method, and iterative methods, which heavily depend on the smooth structure of the manifold.

slide-25
SLIDE 25

Infinite Dimensional Case

Since the work of Pusz and Woronowicz (1975) on mathematical physics and operator algebras and the pioneering paper of Kubo and Ando(1980) on operator means, an extensive theory of two- variable means has sprung up for positive matrices and operators

  • n Hilbert spaces, but the n-variable case for n > 2 has remained
  • problematic. It has been a long standing open problem to extend

the two variable geometric mean A#B to n-variables. Since in the statistical, quantum, and other settings as well, one may be interested in the more general case of positive bounded linear operators on an infinite-dimensional Hilbert space, one would like to have a suitable and effective averaging procedure for this context also. However, the significant theory that has developed for the multivariable least-squares mean does not readily carry over to the setting of positive operators on a Hilbert space, since one has no such Riemannian structure nor NPC-metric available.

slide-26
SLIDE 26

Infinite Dimensional Case

Fortunately, there is an alternative path through which one may approach this mean besides the least-squares path. There exists the Karcher equation of positive definite operators

n

  • i=1

wi log(X− 1

2 AiX− 1 2 ) = 0.

(KE).

In 2013, Lawson and L. proved that the Karcher equation has a unique positive definite solution, called the Karcher mean, and that the Karcher mean retains most of its attractive properties of finite dimensional Karcher mean or the geometric mean of a finite number of positive real numbers. They eventually settled the 40- year old open problem: Extend two variable geometric mean of positive definite operators to higher orders [See TAMS and PNAS].

slide-27
SLIDE 27

No Dice Conjecture Again

There is no successful approximation method for the infinite dimensional Karcher mean, the unique solution of the Karcher equation

n

  • i=1

wi log(X− 1

2 AiX− 1 2 ) = 0.

(KE)

No dice conjecture: The inductive mean sequence via the periodic random walk converges to the Karcher mean. “God does not play dice with the universe....” Einstein. “No dice are needed to find an approximation to the Karcher mean

  • f positive definite matrices...” Holbrook.
slide-28
SLIDE 28

Further Works

  • Smoothness of the Karcher mean
  • Numerical methods; global, effective, fast convergence
  • Multivariable analysis of matrix(operator) means
  • Characteristic properties of the Karcher mean. There are

infinitely many geometric means; e.g., ALM, BMP, the Karcher mean, they are all distinct by computation but not by theoretically.

  • Algebraic and categorical systems on geometric means
  • Geometry on geometric means
  • Computing means on phylogenetic trees;

Billera-Holmes-Vogtmann tree in statistical biology

  • Why the inductive mean in S.L.L.N?... Es-Sahib and Heinich.
  • No dice conjecture for the infinite dimensional Karcher mean.
slide-29
SLIDE 29

References

  • 1. T. Ando, C.-K. Li, and R. Mathias, Geometric means, Linear

Algebra and Appl. (2004).

  • 2. D. Bini, B. Meini and F. Poloni, An effective matrix geometric

mean satisfying the Ando-Li-Mathias properties, Math. Comp. (2010).

  • 3. R. Bhatia and R. Karandikar, Monotonicity of the matrix

geometric mean, Math. Ann. (2012).

  • 4. J. Holbrook, No dice: a determinstic approach to the Cartan

centroid, J. Ramanujan Math. Soc.(2013).

  • 5. J. Lawson and Y. Lim, Monotonic properties of the least

squares mean, Math. Ann.(2011).

  • 6. —, Weighted means and Karcher equations of positive
  • perators, Proceedings of the National Academy of Sciences, USA

(2013).

  • 7. —, Karcher means and Karcher equations of positive definite
  • perators, to appear Trans. Amer. Math. Soc.