The Power of Local Search for Clustering in Separable Instances - - PowerPoint PPT Presentation

the power of local search for clustering in separable
SMART_READER_LITE
LIVE PREVIEW

The Power of Local Search for Clustering in Separable Instances - - PowerPoint PPT Presentation

The Power of Local Search for Clustering in Separable Instances Vincent Cohen-Addad Sorbonne Universit e & CNRS Joint work with: Philip N. Klein Brown University Claire Mathieu Ecole normale sup erieure & CNRS Vincent


slide-1
SLIDE 1

The Power of Local Search for Clustering in “Separable Instances”

Vincent Cohen-Addad Sorbonne Universit´ e & CNRS Joint work with: Philip N. Klein Brown University Claire Mathieu Ecole normale sup´ erieure & CNRS

Vincent Cohen-Addad 1 / 29

slide-2
SLIDE 2

What is Clustering?

Partition data points according to distances. Group buildings to locate firestations Underlying data: Road networks.

Vincent Cohen-Addad 2 / 29

slide-3
SLIDE 3

Partition data according to similarity. Underlying data: Points in R2.

Vincent Cohen-Addad 3 / 29

slide-4
SLIDE 4

How to model clustering?

k-Clustering

Input: data points A in a metric space Output: set C of k centers that minimizes

  • a∈A

min

c∈C d(a, c)p.

k-median is when p = 1, k-means is when p = 2.

Vincent Cohen-Addad 4 / 29

slide-5
SLIDE 5

The 1-median problem dates back to Fermat (1636). Given three points a, b, c ∈ R2, find a point d that minimizes d(a, d) + d(b, d) + d(c, d). If more than 3 points, it is hard to compute exactly!

Vincent Cohen-Addad 5 / 29

slide-6
SLIDE 6

Algorithms for Clustering: History

k-median: 1964 Introduction of the Problem [Hakimi] 1979 NP-Hardness [Kariv and Hakimi] 2002 623-approx [Charikar et al.] 2004 3 + ε-approx [Arya et al.] 2013 1 + √ 3 ≈ 2.732 + ε-approx [Li and Svensson] 2015 (current best) 2.675 + ε [Byrka et al.] k-means: 1967 Introduction of the Problem [MacQueen] 2004 (current best) 16 + ε [Kanungo et al.]

NP-Hard

To obtain better than 1 + 2/e ≈ 1.735 approx for k-median in polynomial time.

Vincent Cohen-Addad 6 / 29

slide-7
SLIDE 7

Focus on real-world: Road Networks planar graphs Machine learning and image compression low-dimensional Euclidean space

Vincent Cohen-Addad 7 / 29

slide-8
SLIDE 8

Previous Work on Restricted Metrics

Planar graphs Nothing Better than General Case RO(1) k-median (1 + ε) [Arora et al. ’98] k-means 9 [Kanungo et al. ’04]

Vincent Cohen-Addad 8 / 29

slide-9
SLIDE 9

Recent Results for RO(1)

[C.-A. and Mathieu, SoCG ’15]

Local search achieves a (1 + ε)-approximation using (1 + ε)k centers for k-median.

[Bandyapadhyay and Varadarajan, SoCG ’16 ]

Local search achieves a (1 + ε)-approximation using (1 + ε)k centers for k-means.

Main open problems:

Obtain better than general case in planar graphs Obtain (1 + ε) for RO(1) for k-means using k centers Design a unified approach for well-clusterable instances

Vincent Cohen-Addad 9 / 29

slide-10
SLIDE 10

Our Results Local search is a PTAS for uniform facility location in edge-weighted planar graphs. Local search is a PTAS for k-median in edge-weighted planar graphs. Local search is a PTAS for k-means in Rd.

Vincent Cohen-Addad 10 / 29

slide-11
SLIDE 11

Techniques: Separators

Planar graphs

Planar separator [Lipton and Tarjan, SIAM J. App. Math. ’79]:

RO(1)

Isoperimetric inequality through [Bhattiprolu and Har-Peled, SoCG ’16].

Vincent Cohen-Addad 11 / 29

slide-12
SLIDE 12

Local search is a PTAS for uniform facility location in edge-weighted planar graphs.

6 2 2 4 c Cost of c = dist(c, Solution) = 6 + 2 + 2 + 4 = 14

Cost of the solution: 6 (opening cost) +

c (cost of c)

Vincent Cohen-Addad 12 / 29

slide-13
SLIDE 13

Local search:

Yes Better? Try a local change No Start with a solution Restart and Try another local change Obtain a slightly different solution Repeat and start with this solution

Repeat Find better solution S among sets that differ from S in at most 1/ε2 centers Replace S by S Until: local optimum

Vincent Cohen-Addad 13 / 29

slide-14
SLIDE 14

Local search:

Yes Better? Try a local change No Start with a solution Restart and Try another local change Obtain a slightly different solution Repeat and start with this solution

Repeat Find better solution S among sets that differ from S in at most 1/ε2 centers Replace S by S Until: local optimum

Vincent Cohen-Addad 13 / 29

slide-15
SLIDE 15

Local search:

Yes Better? Try a local change No Start with a solution Restart and Try another local change Obtain a slightly different solution Repeat and start with this solution

Repeat Find better solution S among sets that differ from S in at most 1/ε2 centers Replace S by S Until: local optimum

Vincent Cohen-Addad 13 / 29

slide-16
SLIDE 16

Why does any 1/ε2-locally-optimal solution have value (1 + ε)OPT? Proof structure:

1 Define a structured near-optimal solution OPT′ 2 Compare the local solution L to OPT′ Vincent Cohen-Addad 14 / 29

slide-17
SLIDE 17

Local optimum Global optimum

Contract the clusters of the clustering L ∪ OPT.

Contraction

Obtain a planar graph ˜ G

Vincent Cohen-Addad 15 / 29

slide-18
SLIDE 18

What do we know about planar graphs?

Vincent Cohen-Addad 16 / 29

slide-19
SLIDE 19

What do we know about planar graphs?

Planar separator [Lipton and Tarjan, SIAM J. App. Math. ’79]

For any planar graph with n vertices, there exists a balanced separator with O(√n) vertices.

Vincent Cohen-Addad 16 / 29

slide-20
SLIDE 20

1/ε2-division – Corollary of Lipton and Tarjan

If ˜ G planar then ∃ a partition into regions such that: at most 1/ε2 vertices in each at most εV ( ˜ G) boundary vertices

Vincent Cohen-Addad 17 / 29

slide-21
SLIDE 21

1/ε2-division – Corollary of Lipton and Tarjan

If ˜ G planar then ∃ a partition into regions such that: at most 1/ε2 vertices in each at most εV ( ˜ G) boundary vertices

Region 1 Region 2 Region 3 Region 5 Region 4 Region 6

Vincent Cohen-Addad 17 / 29

slide-22
SLIDE 22

Consider the boundary vertices of a 1/ε2-division of ˜ G

Region 1 Region 2 Region 3 Region 5 Region 4 Region 6

New solution OPT′ ← OPT∪ boundary vertices Facility opening cost is ok: f (|OPT| + ε(|OPT| + |L|)) Client cost is optimal: OPT ⊆ OPT′ = ⇒ d(c, closest facility) can

  • nly decrease

Vincent Cohen-Addad 18 / 29

slide-23
SLIDE 23

Comparing L to OPT′ For each region, define a mixed solution M:

{ Facilities of OPT′ ∈ Region } ∪ { Facilities of L / ∈ Region}

Region 1 Region 1

Compare L to M.

Vincent Cohen-Addad 19 / 29

slide-24
SLIDE 24

Region 1

M and L differ by at most 1/ε2 facilities. Local optimality implies that cost(M) ≥ cost(L). What is the cost of M w.r.t to OPT and L?

Vincent Cohen-Addad 20 / 29

slide-25
SLIDE 25

Connection cost in M:

Claim: ∀x ∈ cluster of the region: its closest facility in OPT′ is in M

Region Outside Boundary Region Outside

If x is internal then d(x, M) ≤ d(x, OPT′)

Vincent Cohen-Addad 21 / 29

slide-26
SLIDE 26

Claim: ∀y / ∈ region: d(x, M) ≤ d(x, L) Exact same reasoning w.r.t to L:

Region Outside Boundary

Vincent Cohen-Addad 22 / 29

slide-27
SLIDE 27

Cost of M: Facility opening cost: f · (|{OPT′ ∈ region}| + |{L / ∈ region}|) Client service cost: at most

  • x internal d(x, OPT′) +

y external d(y, L)

Vincent Cohen-Addad 23 / 29

slide-28
SLIDE 28

Local optimality: cost(M) ≥ cost(L) cost(M) ≤

  • x internal

d(x, OPT′) +

  • y external

d(y, L)+ f · |{OPT′ ∈ Region}| + f · |{L / ∈ Region}| cost(L) =

  • x internal

d(x, L) +

  • y external

d(y, L)+ f · |{L ∈ Region}| + f · |{L / ∈ Region}|

  • x internal

d(x, L) + f |{L ∈ Reg.}| ≤

  • x internal

d(x, OPT′) + f |{OPT′ ∈ Reg.}|

Vincent Cohen-Addad 24 / 29

slide-29
SLIDE 29

Local optimality: cost(M) ≥ cost(L) cost(M) ≤

  • x internal

d(x, OPT′) +

  • y external

d(y, L)+ f · |{OPT′ ∈ Region}| + f · |{L / ∈ Region}| cost(L) =

  • x internal

d(x, L) +

  • y external

d(y, L)+ f · |{L ∈ Region}| + f · |{L / ∈ Region}|

  • x internal

d(x, L) + f |{L ∈ Reg.}| ≤

  • x internal

d(x, OPT′) + f |{OPT′ ∈ Reg.}|

Vincent Cohen-Addad 25 / 29

slide-30
SLIDE 30
  • x internal

d(x, L) + f |{L ∈ Reg.}| ≤

  • x internal

d(x, OPT′) + f |{OPT′ ∈ Reg.}|

Sum over all regions cost(L) ≤ cost(OPT) + f |boundary vertices| cost(L) ≤ cost(OPT) + ε · f · |L ∪ OPT| (1 − ε)cost(L) ≤ (1 + ε)cost(OPT)

Vincent Cohen-Addad 26 / 29

slide-31
SLIDE 31

Polynomial-time:

Ensure that enough progress is made at each step = ⇒ lose additional εOPT. Repeat Find a solution S that improves the cost by a factor (1+ε/k) among sets that differ from S in at most 1/ε2 centers Replace S by S Until: local optimum

Vincent Cohen-Addad 27 / 29

slide-32
SLIDE 32

Proof for RO(1)

Building upon [Bhattiprolu and Har-Peled SoCG ’16]

There exists 1/εO(d)-division of the Voronoi partition of a set of points in Rd. Proof works directly.

Vincent Cohen-Addad 28 / 29

slide-33
SLIDE 33

Our Results

Best known approx. Previous New RO(1) 1 + ε (k-median) 1 + ε by Local Search 9 + ε (k-means) H-minor free graphs 2.675 (k-median, UFL) 25 + ε (k-means) New result: Perform “local search” in time n · k · (log n)O(1/εd) in d-dimensional Euclidean spaces. Open: Perform “local search” in f (ε)poly(n) in H-minor-free graphs? PTAS for non-uniform facility location in H-minor-free graphs?

Vincent Cohen-Addad 29 / 29

slide-34
SLIDE 34

Our Results

Best known approx. Previous New RO(1) 1 + ε (k-median) 1 + ε by Local Search 9 + ε (k-means) H-minor free graphs 2.675 (k-median, UFL) 25 + ε (k-means) New result: Perform “local search” in time n · k · (log n)O(1/εd) in d-dimensional Euclidean spaces. Open: Perform “local search” in f (ε)poly(n) in H-minor-free graphs? PTAS for non-uniform facility location in H-minor-free graphs? Thanks for your attention!

Vincent Cohen-Addad 29 / 29