A Review of Regularized Optimal Transport Marco Cuturi Joint work - - PowerPoint PPT Presentation

a review of regularized optimal transport
SMART_READER_LITE
LIVE PREVIEW

A Review of Regularized Optimal Transport Marco Cuturi Joint work - - PowerPoint PPT Presentation

A Review of Regularized Optimal Transport Marco Cuturi Joint work with many people, including: G. Peyr, A. Genevay (ENS) , A. Doucet (Oxford) J. Solomon (MIT) , J.D. Benamou, N. Bonneel, F. Bach, L. Nenna (INRIA), G. Carlier ( Dauphine ). What


slide-1
SLIDE 1

A Review of Regularized Optimal Transport

Marco Cuturi

Joint work with many people, including:

  • G. Peyré, A. Genevay (ENS), A. Doucet (Oxford) J. Solomon (MIT),

J.D. Benamou, N. Bonneel, F. Bach, L. Nenna (INRIA),

  • G. Carlier (Dauphine).
slide-2
SLIDE 2

What is Optimal Transport?

A geometric toolbox to 
 compare probability measures 
 supported on a metric space.

2

Monge Kantorovich Dantzig Wasserstein Brenier McCann Villani Otto

slide-3
SLIDE 3

What is Optimal Transport?

A geometric toolbox to 
 compare probability measures 
 supported on a metric space.

3

Empirical Measures

µ

ν

h1

Color Histograms

h2

Bags

  • f features

d

pθ pθ0

Statistical Models Brain Activation Maps

slide-4
SLIDE 4

h2

Bags

  • f features

d

Brain Activation Maps

What is Optimal Transport?

A geometric toolbox to 
 compare probability measures
 supported on a metric space.

4

pθ pθ0

Statistical Models Empirical Measures

µ

ν

Color Histograms

slide-5
SLIDE 5

What is Optimal Transport?

5

pθ pθ0 P(Ω)

A geometric toolbox to 
 compare probability measures
 supported on a metric space.

slide-6
SLIDE 6

What is Optimal Transport?

5

Wasserstein Distance

W(pθ, pθ0) pθ0 P(Ω)

A geometric toolbox to 
 compare probability measures
 supported on a metric space.

slide-7
SLIDE 7

What is Optimal Transport?

5

[McCann’95] Interpolant

pθ0 P(Ω)

A geometric toolbox to 
 compare probability measures
 supported on a metric space.

slide-8
SLIDE 8

What is Optimal Transport?

6

pθ0 pθ pθ00 P(Ω)

A geometric toolbox to 
 compare probability measures
 supported on a metric space.

slide-9
SLIDE 9

What is Optimal Transport?

6

pθ0 pθ pθ00

Wasserstein Barycenter [Agueh’11]

P(Ω)

A geometric toolbox to 
 compare probability measures
 supported on a metric space.

slide-10
SLIDE 10

OT and data-analysis

  • Key developments in (applied) maths ~’90s


[McCann’95], [JKO’98], [Benamou’98], [Gangbo’98], [Ambrosio’06], [Villani’03/’09].

  • Key developments in TCS / graphics since ’00s


[Rubner’98], [Indyk’03], [Naor’07], [Andoni’15].

  • ๏Small to no-impact in large-scale data analysis:

✦ computationally heavy; ✦ Wasserstein distance is not differentiable

7

slide-11
SLIDE 11

OT and data-analysis

  • Key developments in

[McCann’95] [Ambrosio’06], [Villani’03/’09].

  • Key developments in

[Rubner’98],

  • ๏Small to

✦ computationally heavy; ✦ Wasserstein distance is not differentiable

7

Today’s talk: Entropy Regularized OT

  • Very fast compared to usual approaches,

GPGPU parallel.

  • Differentiable, important if we want to use

OT distances as loss functions.

  • Can be automatically differentiated, simple

iterative process, DL-toolboxes compatible.

  • OT can become a building block in ML.
slide-12
SLIDE 12

Background: OT Geometry

8

Consider (Ω, D), a metric probability space. Let µ, ν be probability measures in P(Ω).

  • [Monge’81] problem: find a map T : Ω → Ω

inf

T #µ=ν

Z

D(x, T (x))µ(dx) x T (x)

slide-13
SLIDE 13

Background: OT Geometry

8

Consider (Ω, D), a metric probability space. Let µ, ν be probability measures in P(Ω).

  • [Monge’81] problem: find a map T : Ω → Ω

inf

T #µ=ν

Z

D(x, T (x))µ(dx) δx

slide-14
SLIDE 14

[Kantorovich’42] Relaxation

9

Π(µ, ν)

def

= {P ∈ P(Ω × Ω)| ∀A, B ⊂ Ω, P (A × Ω) = µ(A), P (Ω × B) = ν(B)}

  • Instead of maps , consider

probabilistic maps, i.e. couplings :

T : Ω → Ω P ∈ P(Ω × Ω)

slide-15
SLIDE 15

10

Π(µ, ν)

def

= {P ∈ P(Ω × Ω)| ∀A, B ⊂ Ω, P (A × Ω) = µ(A), P (Ω × B) = ν(B)}

{ } { } { } {

−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 0.1 0.2 0.3 P (x, y)

[Kantorovich’42] Relaxation

slide-16
SLIDE 16

10

Π(µ, ν)

def

= {P ∈ P(Ω × Ω)| ∀A, B ⊂ Ω, P (A × Ω) = µ(A), P (Ω × B) = ν(B)}

{ } { } { } {

−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 0.1 0.2 0.3 P (x, y) −1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 5 · 10 0.1 0.15 P (x, y) 0.1 0.2 0.3

[Kantorovich’42] Relaxation

slide-17
SLIDE 17

Couplings

11

{ } { } { } {

−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 0.1 0.2 0.3 P (x, y)

slide-18
SLIDE 18

Couplings

12

−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 5 · 10 0.1 0.15 P (x, y) 0.1 0.2 0.3

slide-19
SLIDE 19

Wasserstein Distance

13

  • Def. For p ≥ 1, the p-Wasserstein distance

between µ, ν in P(Ω) is Wp(µ, ν)

def

= ✓ inf

P ∈Π(µ,ν) EP [D(X, Y )p]

◆1/p .

slide-20
SLIDE 20

Wasserstein between 2 Diracs

14

δy δx (Ω, D) W p

p (δx, δy) = D(x, y)

slide-21
SLIDE 21

Wasserstein on Uniform Measures

15

µ =

n

X

i=1

1 nδxi ν =

n

X

j=1

1 nδyj (Ω, D)

slide-22
SLIDE 22

Wasserstein on Uniform Measures

15

µ =

n

X

i=1

1 nδxi ν =

n

X

j=1

1 nδyj (Ω, D) C(σ) = 1 n

n

X

i=1

D(xi, yσi)p

slide-23
SLIDE 23

Optimal Assignment ⊂ Wasserstein

16

µ =

n

X

i=1

1 nδxi W p

p (µ, ν) = min σ∈Sn C(σ)

ν =

n

X

j=1

1 nδyj (Ω, D)

slide-24
SLIDE 24

17

(Ω, D)

Wasserstein on Empirical Measures

µ =

n

X

i=1

aiδxi ν =

m

X

j=1

bjδyj

slide-25
SLIDE 25

Wasserstein on Empirical Measures

18

U(a, b)

def

= {P ∈ Rn×m

+

|P 1m = a,P T 1n = b} MXY

def

= [D(xi, yj)p]ij Consider µ =

n

X

i=1

aiδxi and ν =

m

X

j=1

bjδyj.

     

b1 ... bm a1

· · · · · · · · ·

. . .

· · · P 1m = a · · ·

an

· · · · · · · · ·            

y1 ... ym x1

· · ·

. . .

· D(xi, yj)p ·

xn

· · ·      

slide-26
SLIDE 26

Wasserstein on Empirical Measures

18

U(a, b)

def

= {P ∈ Rn×m

+

|P 1m = a,P T 1n = b} MXY

def

= [D(xi, yj)p]ij Consider µ =

n

X

i=1

aiδxi and ν =

m

X

j=1

bjδyj.

     

b1 ... bm a1

. . . . . . . . .

. . .

. . . P T 1n = b . . .

an

. . . . . . . . .      

     

y1 ... ym x1

· · ·

. . .

· D(xi, yj)p ·

xn

· · ·      

slide-27
SLIDE 27

Wasserstein on Empirical Measures

18

U(a, b)

def

= {P ∈ Rn×m

+

|P 1m = a,P T 1n = b} MXY

def

= [D(xi, yj)p]ij

  • Def. Optimal Transport Problem

W p

p (µ, ν) =

min

P ∈U(a,b)hP , MXY i

Consider µ =

n

X

i=1

aiδxi and ν =

m

X

j=1

bjδyj.

slide-28
SLIDE 28

Discrete OT Problem

19

MXY U(a, b)

slide-29
SLIDE 29

Discrete OT Problem

20

MXY U(a, b) P ?

slide-30
SLIDE 30

Discrete OT Problem

20

  • Def. Dual OT problem

W p

p (µ, ν) =

max

α∈Rn,β∈Rm αi+βj≤D(xi,yj)p

αT a + βT b MXY U(a, b) P ?

slide-31
SLIDE 31

Discrete OT Problem

20

MXY U(a, b) P ? O(n3 log(n))

network flow solver used in practice.

Note: flow/PDE formulations [Beckman’61]/[Benamou’98] can be used for p=1/p=2 for a sparse-graph metric/Euclidean metric.

slide-32
SLIDE 32

Discrete OT Problem

21

MXY U(a, b) P ?

slide-33
SLIDE 33

Discrete OT Problem

21

MXY U(a, b) P ? O(n3 log(n))

network flow solver used in practice.

slide-34
SLIDE 34

Discrete OT Problem

21

MXY U(a, b) P ? O(n3 log(n))

network flow solver used in practice.

slide-35
SLIDE 35

Discrete OT Problem

22

MXY U(a, b) P ?

slide-36
SLIDE 36

Discrete OT Problem

22

MXY U(a, b) P ? O(n3 log(n))

network flow solver used in practice.

slide-37
SLIDE 37

Discrete OT Problem

23

MXY U(a, b) P ? O(n3 log(n))

network flow solver used in practice.

slide-38
SLIDE 38

Discrete OT Problem

23

MXY U(a, b) P ? O(n3 log(n))

network flow solver used in practice.

P ?

Solution unstable and not always unique.

slide-39
SLIDE 39

Discrete OT Problem

23

MXY U(a, b) O(n3 log(n))

network flow solver used in practice.

P ?

Solution unstable and not always unique.

{P ?}

slide-40
SLIDE 40

Discrete OT Problem

24

MXY U(a, b) O(n3 log(n))

network flow solver used in practice.

{P ?} P ?

Solution unstable and not always unique.

slide-41
SLIDE 41

Discrete OT Problem

24

MXY U(a, b) O(n3 log(n))

network flow solver used in practice.

P ? P ?

Solution unstable and not always unique.

slide-42
SLIDE 42

Discrete OT Problem

24

MXY U(a, b) O(n3 log(n))

network flow solver used in practice.

P ? P ?

Solution unstable and not always unique.

W p

p (µ, ν) not differentiable.

slide-43
SLIDE 43

Entropic Regularization [Wilson’62]

25

Note: Unique optimal solution because of strong concavity of Entropy

E(P)

def

= −

nm

X

i,j=1

Pij(log Pij)

  • Def. Regularized Wasserstein, γ ≥ 0

Wγ(µ, ν)

def

= min

P ∈U(a,b)hP , MXY i γE(P )

slide-44
SLIDE 44

Entropic Regularization [Wilson’62]

25

γ

µ ν Pγ

Note: Unique optimal solution because of strong concavity of Entropy

  • Def. Regularized Wasserstein, γ ≥ 0

Wγ(µ, ν)

def

= min

P ∈U(a,b)hP , MXY i γE(P )

slide-45
SLIDE 45

Fast & Scalable Algorithm

26

  • Prop. If Pγ

def

= argmin

P ∈U(a,b)

hP , MXY iγE(P ) then 9!u 2 Rn

+, v 2 Rm +, such that

Pγ = diag(u)Kdiag(v), K

def

= e−MXY /γ

slide-46
SLIDE 46

Fast & Scalable Algorithm

26

  • Prop. If Pγ

def

= argmin

P ∈U(a,b)

hP , MXY iγE(P ) then 9!u 2 Rn

+, v 2 Rm +, such that

Pγ = diag(u)Kdiag(v), K

def

= e−MXY /γ

L(P, α, β) = X

ij

PijMij + γPij log Pij + αT (P1 − a) + βT (P T 1 − b) ∂L/∂Pij = Mij + γ(log Pij + 1) + αi + βj (∂L/∂Pij = 0) ⇒Pij = e

αi γ + 1 2 e − Mij γ

e

βj γ + 1 2 = ui Kijvj

slide-47
SLIDE 47

Fast & Scalable Algorithm

26

  • [Sinkhorn’64] fixed-point iterations for
  • complexity, GPGPU parallel [C’13] .
  • if and separable.
  • Prop. If Pγ

def

= argmin

P ∈U(a,b)

hP , MXY iγE(P ) then 9!u 2 Rn

+, v 2 Rm +, such that

Pγ = diag(u)Kdiag(v), K

def

= e−MXY /γ (u, v) O(nm) Dp

[S..C..’15]

Ω = {1, . . . , n}d O(nd+1) u ← a/Kv, v ← b/KT u

slide-48
SLIDE 48

Very Fast EMD Approx. Solver

27

  • Note. is a random graph with shortest path metric, histograms

sampled uniformly on simplex, Sinkhorn tolerance 10-2.

(Ω, D)

64 128 256 512 1024 2048 4096 10

−6

10

−4

10

−2

10 10

2

10

4

Histogram Dimension

  • Avg. Execution Time per Distance (in s.)

FastEMD Rubner’s emd CPU γ=0.02 CPU γ=0.1 GPU γ=0.02 GPU γ=0.1

slide-49
SLIDE 49

28

(Ω, D) µ =

n

X

i=1

aiδxi ν =

m

X

j=1

bjδyj

Regularization ⤑ Differentiability

Wγ((a, X), (b, Y )) = min

P ∈U(a,b)hP , MXY iγE(P )

slide-50
SLIDE 50

28

(Ω, D) µ =

n

X

i=1

aiδxi ν =

m

X

j=1

bjδyj

Regularization ⤑ Differentiability

Wγ((a + ∆a, X), (b, Y )) = Wγ((a, X), (b, Y ))+??

slide-51
SLIDE 51

28

(Ω, D) µ =

n

X

i=1

aiδxi ν =

m

X

j=1

bjδyj

Regularization ⤑ Differentiability

a ← a + ∆a

Wγ((a + ∆a, X), (b, Y )) = Wγ((a, X), (b, Y ))+??

slide-52
SLIDE 52

29

(Ω, D) µ =

n

X

i=1

aiδxi ν =

m

X

j=1

bjδyj

Wγ((a, X + ∆X), (b, Y )) = Wγ((a, X), (b, Y ))+??

Regularization ⤑ Differentiability

slide-53
SLIDE 53

29

(Ω, D) µ =

n

X

i=1

aiδxi ν =

m

X

j=1

bjδyj X ← X + ∆X

Wγ((a, X + ∆X), (b, Y )) = Wγ((a, X), (b, Y ))+??

Regularization ⤑ Differentiability

slide-54
SLIDE 54

30

  • Quantization, k-means problem [Lloyd’82]
  • [McCann’95] Interpolant
  • [JKO’98] PDE’s as gradient flows in

min

µ∈P(Ω)(1 − t)W 2 2 (µ, ν1) + tW 2 2 (µ, ν2)

min

µ∈P(Rd) | supp µ|=k

W 2

2 (µ, νdata)

µt+1 = argmin

µ∈P(Ω)

J(µ) + λtW p

p (µ, µt)

(P(Ω), W).

Crucial for “min data + W ” problems

slide-55
SLIDE 55

30

  • Quantization,
  • [McCann’95]
  • [JKO’98]

min

µ∈P(Ω)(1 − t)W 2 2 (µ, ν1) + tW 2 2 (µ, ν2)

min

µ∈P(Rd) | supp µ|=k

W 2

2 (µ, νdata)

µt+1 = argmin

µ∈P(Ω)

J(µ) + λtW p

p (µ, µt)

(P(Ω), W).

Any (ML) problem involving a KL or L2 loss 
 between (parameterized) histograms or probabilility measures can be easily Wasserstein-ized if we can differentiate W efficiently.

Crucial for “min data + W ” problems

slide-56
SLIDE 56
  • 1. Differentiability of Regularized OT

31

  • Def. Dual regularized OT Problem

Wγ(µ, ν) = max

α,β αT a + βT b − 1

γ (eα/γ)T K Keβ/γ

  • Prop. W(µ, ν) is
  • 1. convex w.r.t. a (Danskin),

raW = α? = γ log(u).

  • 2. decreased, when p = 2, Ω = Rd, using

X Y P T

D(a−1).

[CD’14]

slide-57
SLIDE 57

32

[CP’16]

  • Prop. Writing Hν : a 7! Wγ(µ, ν),
  • 1. Hν has simple Legendre transform:

H∗

ν : g 2 Rn 7! γ

⇣ E(b) + bT log(Keg/γ) ⌘

  • 2. If A 2 Rn×d, f convex on Rd,

min

a∈ΣnHν(a)+f(Aa)=max g∈RdH∗ ν(

ATg)f ∗( g)

  • 2. Duality for Regularized OT’s
slide-58
SLIDE 58

33

  • 3. Stochastic Formulation

Wγ(µ, ν) = max

α,β αT a + βT b − 1

γ (eα/γ)T Keβ/γ = max

α αT a − γ(log Keα/γ)T b

= max

α m

X

j=1

bj ⇣ αT a − γ log KT

·jeα/γ⌘

= max

α m

X

j=1

f j(α)

  • [GCPB’16] shows how incremental gradient

methods can be used to scale this further.

slide-59
SLIDE 59

34

  • 4. Algorithmic Formulation

Prop.

∂WL ∂X , ∂WL ∂a

can be computed recur- sively, in O(L) kernel K×vector products.

  • Def. For L 1, define

WL(µ, ν)

def

= hPL, MXY i, where PL

def

= diag(uL)Kdiag(vL), v0 = 1m; l 0, ul

def

= a/ Kvl, vl+1

def

= b/ KT ul.

slide-60
SLIDE 60

35

✓∂v0 ∂a ◆T = 0m×n, ✓∂ul ∂a ◆T x = x Kvl

  • ✓∂vl

∂a ◆T KT x a ( Kvl)2 , ✓∂vl+1 ∂a ◆T y = ✓∂ul ∂a ◆T K y b ( KT ul)2 . Example: Differentiability w.r.t. a

Algorithmic Formulation of Reg. OT

slide-61
SLIDE 61

36

Example: Differentiability w.r.t. a

N = K MXY raWL(µ, ν) = ✓∂uL ∂a ◆T NvL + ✓∂vL ∂a ◆T N T uL

Algorithmic Formulation of Reg. OT

slide-62
SLIDE 62

37

Algorithmic Formulation of Wasserstein

slide-63
SLIDE 63

38

slide-64
SLIDE 64

39

slide-65
SLIDE 65

40

slide-66
SLIDE 66

41

  • [Agueh’11] Barycenters [CD’14][BCCNP’15]


[GCP’15][S..C..’15]

  • [Burger’12] TV gradient flow using duality [CP’16]
  • Dictionary Learning / Latent Factors [RCP’16]
  • [Bigot’15] W-PCA [SC’15]
  • Density fitting / parameter estimation [MMC’16]
  • Inverse problems / Wasserstein regression [BPC’16]

Thanks to these tricks…

slide-67
SLIDE 67

Wasserstein Barycenters

42

Wasserstein Barycenter [Agueh’11]

min

µ∈P(Ω) N

X

i=1

λiW p

p (µ, νi)

ν1 ν2 ν3 P(Ω)

slide-68
SLIDE 68

Multimarginal Formulation

  • Exact solution (W2) using MM-OT. [Agueh’11]

−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1

43

slide-69
SLIDE 69

Multimarginal Formulation

  • Exact solution (W2) using MM-OT. [Agueh’11]

−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1

If | supp νi| = ni, LP of size (Q

i ni, P i ni)

−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1

43

slide-70
SLIDE 70
  • When is a finite set, metric M, another LP.

Finite Case, LP Formulation

44

Ω min

µ

X

i

λiW p

p (µ, νi)

slide-71
SLIDE 71
  • When is a finite set, metric M, another LP.

Finite Case, LP Formulation

44

Ω min

P1,··· ,PN ,a N

X

i=1

λihPi, M i s.t. Pi

T 1n = bi, 8i  N,

P11n = · · · = PN1d = a.

If |Ω| = n, LP of size (Nn2, (2N − 1)n); unstable

slide-72
SLIDE 72

Primal Descent on Regularized W

45

[CD’14]

min

µ∈Q⊂P(Ω) N

X

i=1

λiWγ(µ, νi)

Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014

slide-73
SLIDE 73

Primal Descent on Regularized W

45

[CD’14]

min

µ∈Q⊂P(Ω) N

X

i=1

λiWγ(µ, νi)

Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014

slide-74
SLIDE 74

Primal Descent on Regularized W

45

[CD’14]

min

µ∈Q⊂P(Ω) N

X

i=1

λiWγ(µ, νi)

Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014

slide-75
SLIDE 75

Wasserstein Barycenter = KL Projections

46

[BCCNP’15]

hP, MXY i γE(P) = γKL(P| K) C1 = {P|∃a, ∀i, Pi1m = a} C2 =

  • P|∀i, P T

i 1n = bi

min

a N

X

i=1

λiWγ(a, bi) = min

P=[P1,...,PN ] P∈C1∩C2 N

X

i=1

λiKL(Pi|K)

slide-76
SLIDE 76

Wasserstein Barycenter = KL Projections

46

[ K · · · K] Pγ

[BCCNP’15]

C1 = {P|∃a, ∀i, Pi1m = a} C2 =

  • P|∀i, P T

i 1n = bi

min

a N

X

i=1

λiWγ(a, bi) = min

P=[P1,...,PN ] P∈C1∩C2 N

X

i=1

λiKL(Pi|K)

slide-77
SLIDE 77

Wasserstein Barycenter = KL Projections

46

[ K · · · K] Pγ

u=ones(size(B)); % d x N matrix while not converged v=u.*(K’*(B./(K*u))); % 2(Nd^2) cost u=bsxfun(@times,u,exp(log(v)*weights))./v; end a=mean(v,2);

[BCCNP’15]

C1 = {P|∃a, ∀i, Pi1m = a} C2 =

  • P|∀i, P T

i 1n = bi

min

a N

X

i=1

λiWγ(a, bi) = min

P=[P1,...,PN ] P∈C1∩C2 N

X

i=1

λiKL(Pi|K)

Iterative Bregman Projections for Regularized Transportation Problems SIAM J. on Sci. Comp. 2015

slide-78
SLIDE 78

Application: Graphics

47

Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15

[S..C..’15]

slide-79
SLIDE 79

Application: Graphics

47

Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15

[S..C..’15]

slide-80
SLIDE 80

Application: Graphics

47

Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15

[S..C..’15]

slide-81
SLIDE 81

Application: Graphics

47

Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15

[S..C..’15]

slide-82
SLIDE 82

Application: Graphics

47

Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15

[S..C..’15]

slide-83
SLIDE 83

48

  • consider Barycenter operator:
  • address now Wasserstein inverse problems:

b(λ)

def

= argmin

a N

X

i=1

λiWγ(a, bi) Given a, find argmin

λ∈ΣN

E(λ)

def

= Loss(a, b(λ))

Inverse Wasserstein Problems

slide-84
SLIDE 84

49

The Wasserstein Simplex

slide-85
SLIDE 85

Barycenters = Fixed Points

50

  • Prop. [BCCNP’15] Consider B ∈ ΣN

d

and let U0 = 1d×N, and then for l ≥ 0: bl def = exp

  • log
  • KT Ul
  • λ
  • ;

8 < : Vl+1

def

=

bl1T

N

KT Ul ,

Ul+1

def

=

B KVl+1 .

slide-86
SLIDE 86

51

Using Truncated Barycenters

argmin

λ∈ΣN

E(L)(λ)

def

= Loss(a, b(L)(λ)) argmin

λ∈ΣN

E(λ)

def

= Loss(a, b(λ))

  • instead of using the exact barycenter
  • use instead the L-iterate barycenter
  • Differente using the chain rule.

rE(L)(λ) = [∂b(L)]T (g), g

def

= rLoss(a, ·)|b(L)(λ).

slide-87
SLIDE 87

52

Gradient / Barycenter Computation

slide-88
SLIDE 88

53

Application: Volume Reconstruction

Wasserstein Barycentric Coordinates: Histogram Regression using Optimal Transport, SIGGRAPH’16

[BPC’16]

slide-89
SLIDE 89

54

Application: Color Grading

slide-90
SLIDE 90

55

Application: Color Grading

slide-91
SLIDE 91

56

Application: Color Grading

slide-92
SLIDE 92

57

Application: Color Grading

Wasserstein Barycentric Coordinates: Histogram Regression using Optimal Transport, SIGGRAPH’16

[BPC’16]

slide-93
SLIDE 93

58

Application: Brain Mapping

slide-94
SLIDE 94

To conclude

  • Entropy regularization is a very effective way to get

OT to work as a generic loss.

  • Many recent extensions:
  • [Schmitzer’16]: fast multiscale approaches
  • [ZFMAP’15] [CSPV’16]: Unbalanced transport
  • [SPKS’16] [PCS’16] extensions to Gromov-W.
  • [FCTR’15] Domain adaptation in ML

59