What can be sampled loca!y?
Yitong Yin Nanjing University
Joint work with: W eiming Feng, Y uxin Sun
What can be sampled loca ! y ? Yitong Yin Nanjing University Joint - - PowerPoint PPT Presentation
What can be sampled loca ! y ? Yitong Yin Nanjing University Joint work with: W eiming Feng, Y uxin Sun Local Computation What can be computed locally? [Noar, Stockmeyer, STOC93, SICOMP95] the LOCAL model [Linial 87] :
Yitong Yin Nanjing University
Joint work with: W eiming Feng, Y uxin Sun
synchronized.
send messages of unbounded sizes to all its neighbors.
terminate in the worst case.
the LOCAL model [Linial ’87]: “What can be computed locally?”
[Noar, Stockmeyer, STOC’93, SICOMP’95]
vertex/edge coloring, Lovász local lemma
maximum matching, minimum vertex cover, minimum dominating set
Locally Checkable Labeling (LCL) problems
[Noar, Stockmeyer ’93] : Q: “What locally definable problems are locally computable?”
the LOCAL model [Linial ’87]: network G(V,E)
by local constraints in O(1) rounds
network G(V,E):
feasible solution:
(in the LOCAL model) Q: “What locally definable joint distributions are locally sample-able?”
network G(V,E):
variable with finite domain [q].
weighted binary constraint:
weighted unary constraint:
Ae : [q]2 → R≥0
bv : [q] → R≥0
µ(σ) ∝ Y
e=(u,v)∈E
Ae(σu, σv) Y
v∈V
bv(σv)
Ae bv Xv∈[q] u v
(MRF)
~ X ∈ [q]V follows µ
network G(V,E): Ae bv Xv∈[q] u v
~ X ∈ [q]V follows µ
(MRF)
µ(σ) ∝ Y
e=(u,v)∈E
Ae(σu, σv) Y
v∈V
bv(σv)
Ae = ...
bv = 1 . . . 1
bv = 1 1
1 1 1
[Fraigniaud, Heinrich, Kosowski FOCS’16]
Ae ∈ {0, 1}q×q, bv ∈ {0, 1}q
distributed system.
probabilistic graphical model (e.g. the Markov random field) by distributed algorithms.
G(V,E):
pick a uniform random vertex v;
resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v));
starting from an arbitrary X0 ∈ [q]V transition for Xt → Xt+1 : marginal distribution:
Pr[Xv = x | XN(v)] = bv(x) Q
u∈N(v) A(u,v)(Xu, x)
P
y∈[q] bv(y) Q u∈N(v) A(u,v)(Xu, y)
Ae
bv
v v
µ(σ) ∝ Y
e=(u,v)∈E
Ae(σu, σv) Y
v∈V
bv(σv)
MRF:
∀σ ∈ [q]V ,
stationary distribution: µ mixing time: τmix = max
X0 min
1 2e
for q-coloring: q≥(2+ε)Δ
Dobrushin’s condition
Δ = max-degree u v influence matrix : {ρv,u}v,u∈V ρv,u: max discrepancy (in total variation distance) of
marginal distributions at v caused by any pair σ,τ
Dobrushin’s condition:
kρk∞ = max
v∈V
X
u∈V
⇢v,u 1 ✏
contraction of one-step
case w.r.t. Hamming distance
Theorem (Dobrushin ’70; Salas, Sokal ’97): Dobrushin’s condition
for Glauber dynamics
τmix = O (n log n)
G(V,E): v v Glauber dynamics: Parallelization:
Vertices in the same color class are updated in parallel.
All vertices are updated in parallel, ignoring concurrency issues. pick a uniform random vertex v;
resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v));
starting from an arbitrary X0 ∈ [q]V transition for Xt → Xt+1 :
G(V,E):
resample X(v) according to the marginal distribution induced by µ at vertex v conditioning on Xt(N(v));
at each step, for each vertex v∈V: starting from an arbitrary X0 ∈ [q]V
independently sample a random number βv∈[0,1]; if βv is locally maximum among its neighborhood N(v):
Luby step Glauber step
according to the current marginal distributions.
Dobrushin’s condition:
kρk∞ = max
v∈V
X
u∈V
⇢v,u 1 ✏
influence matrix {ρv,u}v,u∈V u v Theorem (Dobrushin ’70; Salas, Sokal ’97): Dobrushin’s condition
for Glauber dynamics
τmix = O (n log n) Dobrushin’s condition for the LubyGlauber chain τmix = O (∆ log n)
Dobrushin’s condition:
kρk∞ = max
v∈V
X
u∈V
⇢v,u 1 ✏
influence matrix
{ρv,u}v,u∈V
u v Dobrushin’s condition for the LubyGlauber chain τmix = O (∆ log n)
Dv,v = Pr[v is picked in Luby step] ≥ 1 deg(v) + 1
D is diagonal and
p(t)
v
= Pr[Xt(v) 6= Yt(v)]
in the one-step optimal coupling (Xt,Yt), let where
Proof (similar to [Hayes’04] [Dyer-Goldberg-Jerrum’06]):
p(t+1) ≤ Mp(t)
Pr[Xt 6= Yt] kp(t)k1 nkp(t)k∞ nkMkt
∞kp(0)k∞
n ✓ 1 ✏ ∆ + 1 ◆t
M = (I − D) + Dρ
Glauber LubyGlauber O(n log n) O(Δ log n) ∆ = max-degree
parallel speedup = θ(n /Δ)
Q: “How to update all variables simultaneously and still converge to the correct distribution?” χ = chromatic no. Do not update adjacent vertices simultaneously. It takes ≥χ steps to update all vertices at least once.
starting from an arbitrary X ∈ [q]V, at each step:
each vertex v∈V independently proposes a random σv∈[q] with probability ; each edge e=(u,v) passes its check independently with prob. ; each vertex v∈V accepts its proposal and update Xv to σv if all incident edges pass their checks;
a collective coin flipping made between u and v
u v w
Xu Xv Xw
current: proposals:
σu σv σw
w.r.t. the MRF Gibbs distribution µ.
bv(σv)/ P
i∈[q] bv(i)
Ae(Xu, σv)Ae(σu, Xv)Ae(σu, σv)/ max
i,j∈[q](Ae(i, j))3
P(X, Y ) P(Y, X) = P
(σ,C)∈ΩX→Y Pr(σ)Pr(C | σ, X)
P
(σ,C)∈ΩY →X Pr(σ)Pr(C | σ, Y )
Detailed Balance Equation:
µ(X)P(X, Y ) = µ(Y )P(Y, X)
∀X, Y ∈ [q]V ,
σ ∈ [q]V :
the proposals of all vertices
C ∈ {0, 1}E :
indicates whether each edge e∈E passes its check
ΩX→Y , {(σ, C) | X → Y when the random choice is (σ, C)} Bijection
is constructed as:
φX,Y : ΩX→Y → ΩY →X
(σ, C)
φX,Y
7 ! (σ0, C0)
C = C0
Ce = 1 if for all e incident with v, then σ0
v = Xv
v = σv
⇢
s.t. = µ(Y ) µ(X)
Pr(σ)Pr(C | σ, X) Pr(σ0)Pr(C0 | σ0, Y ) = Y
v2V
bv(Yv) bv(Xv) Y
e=uv2E
Ae(Yu, Yv) Ae(Xu, Xv) = µ(Y ) µ(X)
starting from an arbitrary X ∈ [q]V, at each step:
each vertex v∈V independently proposes a random σv∈[q] with probability ; each edge e=(u,v) passes its check independently with prob. ; each vertex v∈V accepts its proposal and update Xv to σv if all incident edges pass their checks;
a collective coin flipping made between u and v
u v w
Xu Xv Xw
current: proposals:
σu σv σw
w.r.t. the MRF Gibbs distribution µ.
bv(σv)/ P
i∈[q] bv(i)
Ae(Xu, σv)Ae(σu, Xv)Ae(σu, σv)/ max
i,j∈[q](Ae(i, j))3
starting from an arbitrary X ∈ {0,1}V, with 1 indicating occupied at each step, each vertex v∈V:
proposes a random σv∈{0,1} independently accepts the proposal and update Xv to σv unless for some neighbor u of v: Xu=σv=1 or σu=Xv=1 or σu=σv=1 ;
σv = ( 1 with probability
λ 1+λ,
with probability
1 1+λ;
∀ independent set I in G:
µ(I) = λ|I| P
I: IS in G λ|I|
the hardcore model on G(V,E) with fugacity λ:
starting from an arbitrary X ∈ [q]V, at each step, each vertex v∈V:
proposes a color σv∈[q] uniformly and independently at random; accepts the proposal and update Xv to σv if for all v’s neighbors u: Xu≠σv ∧ σu≠Xv ∧ σu≠σv ;
The O(log n) mixing time bound holds even for unbounded Δ and q.
Theorem (Feng, Sun, Y
. ’17):
τmix=O(log n)
q ≥ (2 + √ 2 + ✏)∆
for LocalMetropolis on q-coloring
Xroot = red , Yroot = blue Δ-regular tree ∀ non-root v, Xv = Yv ∉ {red, blue}
proposes a uniform random color σv∈[q]; update Xv to σv if for all v’s neighbors u: Xu≠σv ∧ σu≠Xv ∧ σu≠σv ; each v:
coupling:
coupling the proposals (σX, σY) so that (X, Y )
(σX,σY )
− → (X0, Y 0)
different colors in the two chains, and proposes consistently if otherwise;
vertex v proposes consistently:
σX
v = σY v
vertex v proposes bijectively:
σX
v =
red if σY
v = blue
blue if σY
v = red
σY
v
Xroot = red , Yroot = blue Δ-regular tree ∀ non-root v, Xv = Yv ∉ {red, blue}
proposes a uniform random color σv∈[q]; update Xv to σv if for all v’s neighbors u: Xu≠σv ∧ σu≠Xv ∧ σu≠σv ; each v:
coupling:
coupling the proposals (σX, σY) so that (X, Y )
(σX,σY )
− → (X0, Y 0)
Pr[X0
root 6= Y 0 root] 1
✓ 1 ∆ q ◆ ✓ 1 2 q ◆∆
non-root u at level l:
Pr[X0
root 6= Y 0 root] +
X
non-root u
Pr[X0
u 6= Y 0 u] 1
✓ 1 ∆ q ◆ ✓ 1 2 q ◆∆ + ∆ q 2∆ ✓ 1 2 q ◆∆1
Pr[X0
u 6= Y 0 u] 1
q ✓ 1 2 q ◆∆1 ✓2 q ◆`1
root:
≤ 1 − e−2/α ✓ 1 − 1 α − 1 α − 2 ◆
q ≥ α∆
(assume )
Xroot = red , Yroot = blue Δ-regular tree ∀ non-root v, Xv = Yv ∉ {red, blue}
proposes a uniform random color σv∈[q]; update Xv to σv if for all v’s neighbors u: Xu≠σv ∧ σu≠Xv ∧ σu≠σv ; each v:
for general graph:
argument;
starting from an arbitrary X ∈ [q]V, at each step, each vertex v∈V:
proposes a color σv∈[q] uniformly and independently at random; accepts the proposal and update Xv to σv if for all v’s neighbors u: Xu≠σv ∧ σu≠Xv ∧ σu≠σv ;
τmix=O(log n)
q ≥ (2 + √ 2 + ✏)∆
Q: “How local can a distributed sampling algorithm be?” Q: “What cannot be sampled locally?”
can collect information up to distance t.
Outputs returned by vertices at distance >2t from each other are mutually independent.
the LOCAL model:
Theorem (Feng, Sun, Y
. ’17): For any non-degenerate MRF, any distributed algorithm that samples from its distribution µ within bounded total variation distance requires Ω(log n) rounds of communications.
Xv Gibbs distribution µ: exponential correlation between Xv
path:
t >2t
v u
’s ’s
kµσu
v
µτu
v kTV exp(O(t))
σu 6= τu :
> n−1/4
for a t = O(log n)
dTV(X, f X) >
1 2e
for any product distribution f
X
Theorem (Feng, Sun, Y
. ’17): For any non-degenerate MRF, any distributed algorithm that samples from its distribution µ within bounded total variation distance requires Ω(log n) rounds of communications.
MRFs with exponential correlation:
for distributed sampling algorithms.
NP=RP.
Sampling almost uniform independent set in graphs with max-degree ∆ by by poly-time Turing machines: The Ω(diam) lower bound holds for sampling from the hardcore model with fugacity
λ > λc(∆) = (∆ − 1)∆−1 (∆ − 2)∆
Theorem (Feng, Sun, Y
. ’17): For any ∆≥6, any distributed algorithm that samples uniform independent set within bounded total variation distance in graphs with max-degree ∆ requires Ω(diam) rounds of communications.
Theorem (Feng, Sun, Y
. ’17): For any ∆≥6, any distributed algorithm that samples uniform independent set within bounded total variation distance in graphs with max-degree ∆ requires Ω(diam) rounds of communications. G: even cycle H: random ∆-regular bipartite gadget
GH :
if ∆≥6: max-degree ∆ sample nearly uniform independent set in GH sample nearly uniform max-cut in even cycle G (long-range correlation!)
Theorem (Feng, Sun, Y
. ’17): For any ∆≥6, any distributed algorithm that samples uniform independent set within bounded total variation distance in graphs with max-degree ∆ requires Ω(diam) rounds of communications.
∅ is an independent set).
when every vertex knows the entire graph:
information, but because of the locality of randomness.
A strong separation of sampling from other local computation tasks:
distributed algorithms:
nontrivial joint distributions;
exhibiting (non-uniqueness) phase transition property. Weiming Feng, Yuxin Sun, Yitong Yin. What can be sampled locally? In PODC’17. arxiv: 1702.00142.
graphs of unbounded degree;
for hardcore model of Guo-Jerrum-Liu.
Any questions?