[PPT] - Data Mining Learning from Large Data Sets Lecture 8 PowerPoint Presentation

SLIDE 1

Data ¡Mining ¡

Learning ¡from ¡Large ¡Data ¡Sets ¡

Lecture ¡8 ¡– ¡Clustering ¡large ¡data ¡sets ¡

¡ 263-‑5200-‑00L ¡ Andreas ¡Krause ¡

SLIDE 2

Announcements ¡

Homework ¡4 ¡out ¡tomorrow ¡

2 ¡

SLIDE 3

Course ¡organizaPon ¡

Retrieval ¡

Given ¡a ¡query, ¡find ¡“most ¡similar” ¡item ¡in ¡a ¡large ¡data ¡set ¡ Determine ¡relevance ¡of ¡search ¡results ¡ Applica'ons: ¡GoogleGoggles, ¡Shazam, ¡… ¡

Supervised ¡learning ¡(ClassificaPon, ¡Regression) ¡

Learn ¡a ¡concept ¡(funcPon ¡mapping ¡queries ¡to ¡labels) ¡ Applica'ons: ¡Spam ¡filtering, ¡predicPng ¡price ¡changes, ¡… ¡

Unsupervised ¡learning ¡(Clustering, ¡dimension ¡reducPon)

¡

IdenPfy ¡clusters, ¡“common ¡pa]erns”; ¡anomaly ¡detecPon ¡ Applica'ons: ¡Recommender ¡systems, ¡fraud ¡detecPon, ¡… ¡

Learning ¡with ¡limited ¡feedback ¡

Learn ¡to ¡opPmize ¡a ¡funcPon ¡that’s ¡expensive ¡to ¡evaluate ¡ Applica'ons: ¡Online ¡adverPsing, ¡opt. ¡UI, ¡learning ¡rankings, ¡… ¡ 3 ¡

SLIDE 4

Unsupervised ¡learning ¡

“Learning ¡without ¡labels” ¡ Typically ¡useful ¡for ¡exploratory ¡data ¡analysis ¡ ¡

(“find ¡pa]erns”; ¡visualizaPon; ¡…) ¡

Most ¡common ¡methods: ¡

Clustering ¡(unsupervised ¡classificaPon) ¡ Dimension ¡reducPon ¡(unsupervised ¡regression) ¡

¡

4 ¡

SLIDE 5

What ¡is ¡clustering? ¡

Given ¡data ¡points, ¡group ¡into ¡clusters ¡such ¡that ¡

Similar ¡points ¡are ¡in ¡the ¡same ¡cluster ¡ Dissimilar ¡points ¡are ¡in ¡different ¡clusters ¡

Points ¡are ¡typically ¡represented ¡either ¡ ¡

in ¡(high-‑dimensional) ¡Euclidean ¡space ¡ in ¡a ¡metric ¡space, ¡given ¡in ¡terms ¡of ¡pairwise ¡distances ¡

(Jaccard, ¡cosine, ¡…) ¡

Anomaly ¡/ ¡outlier ¡detecPon: ¡IdenPficaPon ¡of ¡points ¡

that ¡“don’t ¡fit ¡well ¡in ¡any ¡of ¡the ¡clusters” ¡ ¡ ¡

5 ¡

SLIDE 6

Examples ¡of ¡clustering ¡

Cluster ¡

Documents ¡based ¡on ¡the ¡words ¡they ¡contain ¡ Images ¡based ¡on ¡image ¡features ¡ DNA ¡sequences ¡based ¡on ¡edit ¡distance ¡ Products ¡based ¡on ¡which ¡customers ¡bought ¡them ¡ Customers ¡based ¡on ¡their ¡purchase ¡history ¡ Web ¡surfers ¡based ¡on ¡their ¡queries ¡/ ¡sites ¡they ¡visit ¡ … ¡

6 ¡

SLIDE 7

Standard ¡approaches ¡to ¡clustering ¡

Hierarchical ¡clustering ¡

Build ¡a ¡tree ¡(either ¡bo]om-‑up ¡or ¡top-‑down), ¡represenPng ¡

the ¡distances ¡among ¡the ¡data ¡points ¡ ¡

Example: ¡single-‑, ¡average-‑ ¡linkage ¡agglomeraPve ¡clustering ¡

ParPPonal ¡approaches ¡

Define ¡and ¡opPmize ¡a ¡noPon ¡of ¡“goodness” ¡defined ¡over ¡

parPPons ¡

Example: ¡Spectral ¡clustering, ¡graph-‑cut ¡based ¡approaches ¡

Model-‑based ¡approaches ¡

Maintain ¡cluster ¡“models” ¡and ¡infer ¡cluster ¡membership ¡

(e.g., ¡assign ¡each ¡point ¡to ¡closest ¡center) ¡

Example: ¡k-‑means, ¡Gaussian ¡mixture ¡models, ¡… ¡

7 ¡

SLIDE 8

We ¡will ¡

Review ¡standard ¡clustering ¡algorithms ¡

K-‑means ¡ ProbabilisPc ¡mixture ¡models ¡

Discuss ¡how ¡to ¡scale ¡them ¡to ¡massive ¡data ¡sets ¡and ¡

data ¡streams ¡

8 ¡

SLIDE 9

Clustering ¡example ¡

9 ¡

SLIDE 10

k-‑means ¡

Assumes ¡points ¡are ¡in ¡Euclidean ¡space ¡ Represent ¡clusters ¡as ¡centers ¡ Each ¡point ¡is ¡assigned ¡to ¡closest ¡center ¡

¡ ¡Goal: ¡Pick ¡centers ¡to ¡minimize ¡average ¡squared ¡distance ¡

Non-‑convex ¡opPmizaPon! ¡ ¡ NP-‑hard ¡è ¡can’t ¡solve ¡opPmally ¡in ¡general ¡

10 ¡

xi ∈ Rd µj ∈ Rd

N

X

i=1

min

j

||µj − xi||2

2

SLIDE 11

Classical ¡k-‑means ¡algorithm ¡

IniPalize ¡cluster ¡centers ¡

E.g., ¡pick ¡one ¡point ¡at ¡random, ¡the ¡other ¡ones ¡with ¡

maximum ¡distance ¡

While ¡not ¡converged ¡

Assign ¡each ¡point ¡xi ¡to ¡closest ¡center ¡ Update ¡center ¡as ¡mean ¡of ¡assigned ¡data ¡points ¡

11 ¡

SLIDE 12

K-‑means ¡

12 ¡

SLIDE 13

13 ¡

SLIDE 14

14 ¡

SLIDE 15

15 ¡

SLIDE 16

16 ¡

SLIDE 17

17 ¡

SLIDE 18

18 ¡

SLIDE 19

ProperPes ¡of ¡k-‑means ¡

Guaranteed ¡to ¡monotonically ¡decrease ¡average ¡

squared ¡distance ¡in ¡each ¡iteraPon ¡

Converges ¡to ¡a ¡local ¡opPmum ¡ Complexity: ¡

Per ¡iteraPon ¡

Have ¡to ¡process ¡enPre ¡data ¡set ¡in ¡each ¡iteraPon ¡

19 ¡

L(µ) =

N

X

i=1

min

j

||µj − xi||2

2

SLIDE 20

K-‑means ¡for ¡large ¡data ¡sets ¡/ ¡streams ¡

What ¡if ¡data ¡set ¡does ¡not ¡fit ¡in ¡main ¡memory? ¡

In ¡principle ¡not ¡a ¡problem ¡(why?) ¡ But ¡each ¡iteraPon ¡sPll ¡requires ¡an ¡enPre ¡pass ¡ ¡

through ¡the ¡data ¡set ¡

Recall ¡supervised ¡learning ¡(online ¡SVM, ¡etc.) ¡

There ¡we ¡were ¡able ¡to ¡process ¡one ¡data ¡point ¡at ¡a ¡Pme ¡ Get ¡(provably) ¡good ¡soluPons ¡from ¡a ¡single ¡pass ¡through ¡

the ¡data ¡

Could ¡even ¡do ¡it ¡in ¡parallel! ¡

¡

Can ¡we ¡do ¡the ¡same ¡thing ¡for ¡clustering?? ¡

20 ¡

SLIDE 21

Streaming ¡clustering ¡

How ¡should ¡me ¡maintain ¡clusters ¡as ¡new ¡data ¡arrives? ¡

21 ¡

SLIDE 22

Recall ¡online ¡SVM ¡

Recall ¡Online ¡SVMs ¡(& ¡stochasPc ¡gradient ¡descent) ¡ Loss ¡funcPon ¡decomposes ¡addi'vely ¡over ¡data ¡set ¡ Can ¡take ¡a ¡(sub-‑)gradient ¡step ¡for ¡each ¡data ¡point ¡

22 ¡

L(w) = X

i

hinge(xi; yi, w)

SLIDE 23

Online ¡k-‑means ¡

For ¡k-‑means, ¡loss ¡funcPon ¡also ¡decomposes ¡addiPvely

¡

ver ¡data ¡set ¡

Let’s ¡try ¡take ¡a ¡(sub-‑)gradient ¡step ¡for ¡each ¡data ¡point ¡

23 ¡

L(µ) =

N

X

i=1

min

j

||µj − xi||2

2

SLIDE 24

CalculaPng ¡the ¡gradient ¡

24 ¡

L(µ) =

N

X

i=1

min

j

||µj − xi||2

2

SLIDE 25

Online ¡k-‑means ¡algorithm ¡

IniPalize ¡centers ¡randomly ¡ For ¡t ¡= ¡1:N ¡

Find ¡ Set ¡ ¡ ¡

To ¡converge ¡to ¡local ¡opPmum, ¡need ¡that ¡ ¡

25 ¡

µc ← µc + ηt(xt − µc) c = arg min

j

||µj − xt||2 X

t

ηt = ∞ X

t

η2

t < ∞

SLIDE 26

PracPcal ¡aspects ¡

Generally ¡works ¡best ¡if ¡data ¡is ¡«randomly» ¡ordered ¡

(like ¡stochasPc ¡gradient ¡descent) ¡

Typically, ¡want ¡to ¡choose ¡larger ¡value ¡for ¡k ¡ How ¡can ¡one ¡implement ¡mulPple ¡random ¡restarts ¡in ¡

ne ¡pass? ¡

26 ¡

SLIDE 27

Problems ¡with ¡online ¡k-‑means ¡

Have ¡to ¡commit ¡to ¡“k” ¡in ¡advance ¡ ObjecPve ¡funcPon ¡non-‑convex ¡(and ¡problem ¡NP-‑hard) ¡

à ¡guarantees ¡for ¡online ¡convex ¡programming ¡/ ¡SGD ¡ ¡ ¡ ¡ ¡ ¡ ¡do ¡not ¡apply! ¡

Not ¡clear ¡how ¡to ¡parallelize ¡

27 ¡

SLIDE 28

AlternaPve: ¡Summarizing ¡large ¡data ¡sets ¡

Idea: ¡ ¡

Efficiently ¡construct ¡a ¡compact ¡version ¡C ¡of ¡the ¡data ¡set ¡D ¡

such ¡that ¡solving ¡k-‑means ¡on ¡C ¡gives ¡a ¡good ¡soluPon ¡to ¡D ¡

Approach: ¡

First ¡construct ¡C ¡such ¡that ¡it ¡allows ¡approximately ¡answer ¡

“k-‑means ¡queries” ¡ ¡ i.e., ¡approximately ¡evaluate ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡

Then ¡solve ¡k-‑means ¡using ¡the ¡approximate ¡loss ¡funcPon ¡

28 ¡

L(µ) =

N

X

i=1

min

j

||µj − xi||2

2

SLIDE 29

k-‑mean ¡queries ¡

29 ¡

SLIDE 30

30 ¡

SLIDE 31

31 ¡

SLIDE 32

Data ¡set ¡summarizaPon ¡for ¡k-‑means ¡

32 ¡

SLIDE 33

Data ¡set ¡summarizaPon ¡for ¡k-‑means ¡

Key ¡idea: ¡Replace ¡many ¡points ¡by ¡one ¡weighted ¡

representaPve ¡

33 ¡

Lk(µ; C) = X

(w,x)∈C

w min

j∈{1,...,k} ||µj − x||2 2

SLIDE 34

Coresets ¡

C ¡is ¡called ¡a ¡(k,ε)-‑coreset ¡for ¡data ¡set ¡D, ¡if ¡ ¡

34 ¡

(1 − ε)Lk(µ; D) ≤ Lk(µ; C) ≤ (1 + ε)Lk(µ; D)

Lk(µ; C) = X

(w,x)∈C

w min

j∈{1,...,k} ||µj − x||2 2

SLIDE 35

ConstrucPng ¡coresets ¡

Suppose ¡for ¡all ¡pairs ¡of ¡points: ¡ ¡ First ¡a]empt: ¡Random ¡sampling ¡

Pick ¡n ¡<< ¡N ¡points ¡C ¡uniformly ¡at ¡random ¡from ¡D ¡ Hoeffding’s ¡inequality ¡gives ¡ Thus ¡need ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡points ¡to ¡ensure ¡ ¡

¡ ¡absolute ¡error ¡at ¡most ¡ε ¡with ¡probability ¡at ¡least ¡1-‑δ ¡

35 ¡

||xi − xj||2 ≤ 1 E[Lk(µ; C)] = Lk(µ; D)

Pr ⇣ |Lk(µ; C) − Lk(µ; D)| ≥ ε ⌘ ≤ 2 exp

−2ε2n
O

⇣ 1 ε2 log 1 δ ⌘

SLIDE 36

ConstrucPng ¡coresets ¡

Suppose ¡for ¡all ¡pairs ¡of ¡points: ¡ ¡ First ¡a]empt: ¡Random ¡sampling ¡

Pick ¡n ¡<< ¡N ¡points ¡C ¡uniformly ¡at ¡random ¡from ¡D ¡ Assign ¡uniform ¡weights ¡N/n ¡

¡ ¡

36 ¡

||xi − xj||2 ≤ 1

SLIDE 37

Can ¡we ¡get ¡small ¡relaPve ¡error? ¡

To ¡ensure ¡low ¡mulPplicaPve ¡error, ¡need ¡more ¡

complex ¡construcPon ¡

è ¡will ¡use ¡non-‑uniform ¡sampling! ¡

37 ¡

SLIDE 38

Sampling ¡DistribuPon ¡

Sampling ¡distribuPon ¡ ¡ Bias ¡sampling ¡ ¡ towards ¡small ¡clusters ¡

SLIDE 39

Importance ¡Weights ¡

Weights ¡ Sampling ¡distribuPon ¡ ¡

SLIDE 40

40 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 41

41 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Sample ¡a ¡small ¡set ¡uniformly ¡at ¡random ¡ ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 42

42 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Remove ¡half ¡the ¡blue ¡points ¡nearest ¡the ¡samples ¡
Sample ¡a ¡small ¡set ¡uniformly ¡at ¡random ¡ ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 43

43 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Remove ¡half ¡the ¡blue ¡points ¡nearest ¡the ¡samples ¡
Sample ¡a ¡small ¡set ¡uniformly ¡at ¡random ¡ ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 44

44 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Remove ¡half ¡the ¡blue ¡points ¡nearest ¡the ¡samples ¡
Sample ¡a ¡small ¡set ¡uniformly ¡at ¡random ¡ ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 45

45 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Remove ¡half ¡the ¡blue ¡points ¡nearest ¡the ¡samples ¡
Sample ¡a ¡small ¡set ¡uniformly ¡at ¡random ¡ ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 46

46 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Remove ¡half ¡the ¡blue ¡points ¡nearest ¡the ¡samples ¡
Sample ¡a ¡small ¡set ¡uniformly ¡at ¡random ¡ ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 47

47 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Remove ¡half ¡the ¡blue ¡points ¡nearest ¡the ¡samples ¡
Sample ¡a ¡small ¡set ¡uniformly ¡at ¡random ¡ ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 48

48 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Remove ¡half ¡the ¡blue ¡points ¡nearest ¡the ¡samples ¡
Sample ¡a ¡small ¡set ¡uniformly ¡at ¡random ¡ ¡

Small ¡clusters ¡ ¡are ¡represented ¡

Itera@vely ¡find ¡representa@ve ¡points ¡

SLIDE 49

49 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

ParPPon ¡data ¡via ¡a ¡Voronoi ¡diagram ¡centered ¡at ¡ ¡ ¡ ¡ ¡ ¡points ¡

SLIDE 50

50 ¡

CreaPng ¡a ¡Sampling ¡DistribuPon ¡

Sampling ¡distribuPon ¡ ¡ Points ¡in ¡sparse ¡cells ¡get ¡more ¡mass ¡ and ¡points ¡far ¡from ¡centers ¡

SLIDE 51

51 ¡

Importance ¡Weights ¡

Sampling ¡distribuPon ¡ ¡ Points ¡in ¡sparse ¡cells ¡get ¡more ¡mass ¡ and ¡points ¡far ¡from ¡centers ¡ Weights ¡

SLIDE 52

52 ¡

Non-‑uniform ¡sample ¡

SLIDE 53

53 ¡

Coresets ¡via ¡AdapPve ¡Sampling ¡

C ¡is ¡(k,ε)-‑coreset ¡of ¡size ¡polynomial ¡in ¡d,k,log ¡n, ¡1/ε, ¡1/δ ¡

SLIDE 54

Can ¡do ¡be]er: ¡Coresets ¡for ¡k-‑means ¡

Theorem ¡[Har-‑Peled ¡and ¡Kushal, ¡‘05] ¡

One ¡can ¡find ¡efficiently ¡a ¡(k,ε)-‑coreset ¡for ¡k-‑means ¡of ¡size ¡ ¡

Theorem ¡[Feldman ¡et ¡al ¡’07] ¡

One ¡can ¡efficiently ¡find ¡a ¡ ¡weak ¡(k,ε)-‑coreset ¡of ¡size ¡ ¡

Allows ¡PTAS ¡for ¡k-‑means! ¡

54 ¡

O ⇣ k3/εd+1⌘ O ⇣ poly(k, 1/ε) ⌘

SLIDE 55

Coresets ¡exist ¡for ¡

K-‑means, ¡K-‑median ¡ K-‑line ¡means ¡/ ¡median ¡ PageRank ¡ SVMs ¡ Diameter ¡of ¡a ¡point ¡set ¡ Matrix ¡low-‑rank ¡approximaPon ¡ … ¡

55 ¡

SLIDE 56

ComposiPon ¡of ¡Coresets ¡

Merge ¡ [c.f. ¡Har-‑Peled, ¡Mazumdar ¡04] ¡

56 ¡

SLIDE 57

ComposiPon ¡of ¡Coresets ¡

Compress ¡ Merge ¡ [Har-‑Peled, ¡Mazumdar ¡04] ¡

57 ¡

SLIDE 58

Coresets ¡on ¡Streams ¡

Compress ¡ Merge ¡ [Har-‑Peled, ¡Mazumdar ¡04] ¡

58 ¡

SLIDE 59

Coresets ¡on ¡Streams ¡

Compress ¡ Merge ¡ [Har-‑Peled, ¡Mazumdar ¡04] ¡

59 ¡

SLIDE 60

Coresets ¡on ¡Streams ¡

Compress ¡ Merge ¡ [Har-‑Peled, ¡Mazumdar ¡04] ¡

60 ¡

Error ¡grows ¡linearly ¡with ¡number ¡of ¡compressions ¡

SLIDE 61

Coresets ¡on ¡Streams ¡

Error ¡grows ¡with ¡ height ¡of ¡tree ¡

SLIDE 62

62 ¡

Coresets ¡in ¡Parallel ¡

SLIDE 63

k-‑means ¡clustering ¡with ¡coresets ¡

Given ¡data ¡set ¡D, ¡desired ¡number ¡of ¡clusters ¡k, ¡

precision ¡ε ¡

Construct ¡(k, ¡ε) ¡-‑ ¡coreset ¡C ¡

E.g., ¡in ¡parallel ¡using ¡MapReduce ¡

Solve ¡k-‑means ¡on ¡coreset ¡

If ¡coreset ¡small, ¡can ¡even ¡do ¡exhausPve ¡search! ¡ In ¡pracPce, ¡run ¡k-‑means ¡with ¡many ¡restarts ¡

ResulPng ¡soluPon ¡will ¡be ¡(1+ε)-‑opPmal ¡for ¡D ¡ è ¡Provably ¡near-‑opPmal ¡soluPon! ¡

63 ¡

SLIDE 64

Summary ¡so ¡far ¡

Clustering ¡is ¡a ¡central ¡problem ¡in ¡unsupervised ¡

learning ¡

Two ¡main ¡classes ¡of ¡approaches ¡

Hierarchical ¡(difficult ¡to ¡scale) ¡ Assignment ¡based ¡

Discussed ¡k-‑means ¡algorithm ¡

Widely ¡used ¡clustering ¡algorithm ¡ “Non-‑linear” ¡versions ¡available ¡ Can ¡scale ¡to ¡large ¡data ¡sets ¡using ¡online ¡opPmizaPon ¡and ¡

coreset ¡construcPons ¡

64 ¡

Data ¡Mining ¡

Learning ¡from ¡Large ¡Data ¡Sets ¡

Lecture ¡8 ¡– ¡Clustering ¡large ¡data ¡sets ¡

¡ 263-­‑5200-­‑00L ¡ Andreas ¡Krause ¡

Announcements ¡

Course ¡organizaPon ¡

¡

Unsupervised ¡learning ¡

(“find ¡pa]erns”; ¡visualizaPon; ¡…) ¡

¡

What ¡is ¡clustering? ¡

that ¡“don’t ¡fit ¡well ¡in ¡any ¡of ¡the ¡clusters” ¡ ¡ ¡

Examples ¡of ¡clustering ¡

Standard ¡approaches ¡to ¡clustering ¡

We ¡will ¡

data ¡streams ¡

Clustering ¡example ¡

k-­‑means ¡

¡ ¡Goal: ¡Pick ¡centers ¡to ¡minimize ¡average ¡squared ¡distance ¡

xi ∈ Rd µj ∈ Rd

X

min

||µj − xi||2

Classical ¡k-­‑means ¡algorithm ¡

K-­‑means ¡

ProperPes ¡of ¡k-­‑means ¡

squared ¡distance ¡in ¡each ¡iteraPon ¡

L(µ) =

X

min

||µj − xi||2

K-­‑means ¡for ¡large ¡data ¡sets ¡/ ¡streams ¡

Streaming ¡clustering ¡

Recall ¡online ¡SVM ¡

L(w) = X

hinge(xi; yi, w)

Online ¡k-­‑means ¡

¡

L(µ) =

X

min

||µj − xi||2

CalculaPng ¡the ¡gradient ¡

L(µ) =

X

min

||µj − xi||2

Online ¡k-­‑means ¡algorithm ¡

µc ← µc + ηt(xt − µc) c = arg min

||µj − xt||2 X

ηt = ∞ X

η2

PracPcal ¡aspects ¡

(like ¡stochasPc ¡gradient ¡descent) ¡

Problems ¡with ¡online ¡k-­‑means ¡

à ¡guarantees ¡for ¡online ¡convex ¡programming ¡/ ¡SGD ¡ ¡ ¡ ¡ ¡ ¡ ¡do ¡not ¡apply! ¡

AlternaPve: ¡Summarizing ¡large ¡data ¡sets ¡

L(µ) =

X

min

||µj − xi||2

k-­‑mean ¡queries ¡

Data ¡set ¡summarizaPon ¡for ¡k-­‑means ¡

Data ¡set ¡summarizaPon ¡for ¡k-­‑means ¡

representaPve ¡

Lk(µ; C) = X

w min

Coresets ¡

C ¡is ¡called ¡a ¡(k,ε)-­‑coreset ¡for ¡data ¡set ¡D, ¡if ¡ ¡

(1 − ε)Lk(µ; D) ≤ Lk(µ; C) ≤ (1 + ε)Lk(µ; D)

ConstrucPng ¡coresets ¡

||xi − xj||2 ≤ 1 E[Lk(µ; C)] = Lk(µ; D)

Pr ⇣ |Lk(µ; C) − Lk(µ; D)| ≥ ε ⌘ ≤ 2 exp

⇣ 1 ε2 log 1 δ ⌘

ConstrucPng ¡coresets ¡

||xi − xj||2 ≤ 1

Can ¡we ¡get ¡small ¡relaPve ¡error? ¡

complex ¡construcPon ¡

Sampling ¡DistribuPon ¡

Sampling ¡distribuPon ¡ ¡ Bias ¡sampling ¡ ¡ towards ¡small ¡clusters ¡

¡ 263-‑5200-‑00L ¡ Andreas ¡Krause ¡

k-‑means ¡

Classical ¡k-‑means ¡algorithm ¡

K-‑means ¡

ProperPes ¡of ¡k-‑means ¡

K-‑means ¡for ¡large ¡data ¡sets ¡/ ¡streams ¡

Online ¡k-‑means ¡

Online ¡k-‑means ¡algorithm ¡

Problems ¡with ¡online ¡k-‑means ¡

k-‑mean ¡queries ¡

Data ¡set ¡summarizaPon ¡for ¡k-‑means ¡

Data ¡set ¡summarizaPon ¡for ¡k-‑means ¡

C ¡is ¡called ¡a ¡(k,ε)-‑coreset ¡for ¡data ¡set ¡D, ¡if ¡ ¡

Non-‑uniform ¡sample ¡

C ¡is ¡(k,ε)-‑coreset ¡of ¡size ¡polynomial ¡in ¡d,k,log ¡n, ¡1/ε, ¡1/δ ¡

Can ¡do ¡be]er: ¡Coresets ¡for ¡k-‑means ¡

One ¡can ¡find ¡efficiently ¡a ¡(k,ε)-‑coreset ¡for ¡k-‑means ¡of ¡size ¡ ¡

One ¡can ¡efficiently ¡find ¡a ¡ ¡weak ¡(k,ε)-‑coreset ¡of ¡size ¡ ¡

Merge ¡ [c.f. ¡Har-‑Peled, ¡Mazumdar ¡04] ¡

Compress ¡ Merge ¡ [Har-‑Peled, ¡Mazumdar ¡04] ¡

Compress ¡ Merge ¡ [Har-‑Peled, ¡Mazumdar ¡04] ¡

Compress ¡ Merge ¡ [Har-‑Peled, ¡Mazumdar ¡04] ¡

Compress ¡ Merge ¡ [Har-‑Peled, ¡Mazumdar ¡04] ¡

k-‑means ¡clustering ¡with ¡coresets ¡