A Review of Regularized Optimal Transport
Marco Cuturi
Joint work with many people, including:
- G. Peyré, A. Genevay (ENS), A. Doucet (Oxford) J. Solomon (MIT),
J.D. Benamou, N. Bonneel, F. Bach, L. Nenna (INRIA),
- G. Carlier (Dauphine).
A Review of Regularized Optimal Transport Marco Cuturi Joint work - - PowerPoint PPT Presentation
A Review of Regularized Optimal Transport Marco Cuturi Joint work with many people, including: G. Peyr, A. Genevay (ENS) , A. Doucet (Oxford) J. Solomon (MIT) , J.D. Benamou, N. Bonneel, F. Bach, L. Nenna (INRIA), G. Carlier ( Dauphine ). What
2
Monge Kantorovich Dantzig Wasserstein Brenier McCann Villani Otto
3
µ
ν
h1
h2
d
pθ pθ0
h2
d
4
pθ pθ0
µ
ν
5
5
5
6
6
[McCann’95], [JKO’98], [Benamou’98], [Gangbo’98], [Ambrosio’06], [Villani’03/’09].
[Rubner’98], [Indyk’03], [Naor’07], [Andoni’15].
✦ computationally heavy; ✦ Wasserstein distance is not differentiable
7
[McCann’95] [Ambrosio’06], [Villani’03/’09].
[Rubner’98],
✦ computationally heavy; ✦ Wasserstein distance is not differentiable
7
8
T #µ=ν
Ω
8
T #µ=ν
Ω
9
def
10
def
−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 0.1 0.2 0.3 P (x, y)
10
def
−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 0.1 0.2 0.3 P (x, y) −1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 5 · 10 0.1 0.15 P (x, y) 0.1 0.2 0.3
11
−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 0.1 0.2 0.3 P (x, y)
12
−1 1 2 3 4−1 1 2 3 4 0.2 0.4 0.6 µ(x) ν(y) x y P 5 · 10 0.1 0.15 P (x, y) 0.1 0.2 0.3
13
def
P ∈Π(µ,ν) EP [D(X, Y )p]
14
p (δx, δy) = D(x, y)
15
n
i=1
n
j=1
15
n
i=1
n
j=1
n
i=1
16
n
i=1
p (µ, ν) = min σ∈Sn C(σ)
n
j=1
17
n
i=1
m
j=1
18
def
+
def
n
i=1
m
j=1
b1 ... bm a1
. . .
an
y1 ... ym x1
. . .
xn
18
def
+
def
n
i=1
m
j=1
b1 ... bm a1
. . .
an
y1 ... ym x1
. . .
xn
18
def
+
def
p (µ, ν) =
P ∈U(a,b)hP , MXY i
n
i=1
m
j=1
19
20
20
p (µ, ν) =
α∈Rn,β∈Rm αi+βj≤D(xi,yj)p
20
Note: flow/PDE formulations [Beckman’61]/[Benamou’98] can be used for p=1/p=2 for a sparse-graph metric/Euclidean metric.
21
21
21
22
22
23
23
23
24
24
24
p (µ, ν) not differentiable.
25
Note: Unique optimal solution because of strong concavity of Entropy
def
nm
i,j=1
def
P ∈U(a,b)hP , MXY i γE(P )
25
Note: Unique optimal solution because of strong concavity of Entropy
def
P ∈U(a,b)hP , MXY i γE(P )
26
def
P ∈U(a,b)
+, v 2 Rm +, such that
def
26
def
P ∈U(a,b)
+, v 2 Rm +, such that
def
L(P, α, β) = X
ij
PijMij + γPij log Pij + αT (P1 − a) + βT (P T 1 − b) ∂L/∂Pij = Mij + γ(log Pij + 1) + αi + βj (∂L/∂Pij = 0) ⇒Pij = e
αi γ + 1 2 e − Mij γ
e
βj γ + 1 2 = ui Kijvj
26
def
P ∈U(a,b)
+, v 2 Rm +, such that
def
27
sampled uniformly on simplex, Sinkhorn tolerance 10-2.
(Ω, D)
64 128 256 512 1024 2048 4096 10
−6
10
−4
10
−2
10 10
2
10
4
Histogram Dimension
FastEMD Rubner’s emd CPU γ=0.02 CPU γ=0.1 GPU γ=0.02 GPU γ=0.1
28
n
i=1
m
j=1
P ∈U(a,b)hP , MXY iγE(P )
28
n
i=1
m
j=1
28
n
i=1
m
j=1
29
n
i=1
m
j=1
29
n
i=1
m
j=1
30
µ∈P(Ω)(1 − t)W 2 2 (µ, ν1) + tW 2 2 (µ, ν2)
µ∈P(Rd) | supp µ|=k
2 (µ, νdata)
µ∈P(Ω)
p (µ, µt)
30
µ∈P(Ω)(1 − t)W 2 2 (µ, ν1) + tW 2 2 (µ, ν2)
µ∈P(Rd) | supp µ|=k
2 (µ, νdata)
µ∈P(Ω)
p (µ, µt)
31
α,β αT a + βT b − 1
D(a−1).
32
ν : g 2 Rn 7! γ
a∈ΣnHν(a)+f(Aa)=max g∈RdH∗ ν(
33
α,β αT a + βT b − 1
α αT a − γ(log Keα/γ)T b
α m
j=1
·jeα/γ⌘
α m
j=1
34
∂WL ∂X , ∂WL ∂a
def
def
def
def
35
36
37
38
39
40
41
42
µ∈P(Ω) N
i=1
p (µ, νi)
−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1
43
−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1
i ni, P i ni)
−1 −0.5 0.5 1 1.5 2 2.5 3 −1.5 −1 −0.5 0.5 1
43
44
µ
i
p (µ, νi)
44
P1,··· ,PN ,a N
i=1
T 1n = bi, 8i N,
45
µ∈Q⊂P(Ω) N
i=1
Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014
45
µ∈Q⊂P(Ω) N
i=1
Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014
45
µ∈Q⊂P(Ω) N
i=1
Fast Computation of Wasserstein Barycenters International Conference on Machine Learning 2014
46
i 1n = bi
a N
i=1
P=[P1,...,PN ] P∈C1∩C2 N
i=1
46
i 1n = bi
a N
i=1
P=[P1,...,PN ] P∈C1∩C2 N
i=1
46
u=ones(size(B)); % d x N matrix while not converged v=u.*(K’*(B./(K*u))); % 2(Nd^2) cost u=bsxfun(@times,u,exp(log(v)*weights))./v; end a=mean(v,2);
i 1n = bi
a N
i=1
P=[P1,...,PN ] P∈C1∩C2 N
i=1
Iterative Bregman Projections for Regularized Transportation Problems SIAM J. on Sci. Comp. 2015
47
Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15
47
Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15
47
Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15
47
Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15
47
Convolutional Wasserstein Distances: Efficient Optimal Transportation on Geometric Domains, SIGGRAPH’15
48
def
a N
i=1
λ∈ΣN
def
49
50
d
def
bl1T
N
KT Ul ,
def
B KVl+1 .
51
λ∈ΣN
def
λ∈ΣN
def
def
52
53
Wasserstein Barycentric Coordinates: Histogram Regression using Optimal Transport, SIGGRAPH’16
54
55
56
57
Wasserstein Barycentric Coordinates: Histogram Regression using Optimal Transport, SIGGRAPH’16
58
59