Empirical Comparisons of Fast Methods Dustin Lang and Mike Klaas - - PowerPoint PPT Presentation

empirical comparisons of fast methods
SMART_READER_LITE
LIVE PREVIEW

Empirical Comparisons of Fast Methods Dustin Lang and Mike Klaas - - PowerPoint PPT Presentation

Empirical Comparisons of Fast Methods Dustin Lang and Mike Klaas { dalang, klaas } @cs.ubc.ca University of British Columbia December 17, 2004 Fast N-Body Learning - Empirical Comparisons p. 1 SumKernel Methods Fast Multipole Method


slide-1
SLIDE 1

Empirical Comparisons of Fast Methods

Dustin Lang and Mike Klaas

{dalang, klaas}@cs.ubc.ca

University of British Columbia December 17, 2004

Fast N-Body Learning - Empirical Comparisons – p. 1

slide-2
SLIDE 2

A Map of Fast Methods

Dual−Tree KD−tree Anchors Fast Gauss Transform Gaussian Kernel Improved FGT Fast Multipole Method Sum−Kernel Methods Regular Grid Box Filter

Dual−Tree KD−tree Anchors Distance Transform Regular Grid Max−Kernel Methods

Fast N-Body Learning - Empirical Comparisons – p. 2

slide-3
SLIDE 3

The Role of Fast Methods

We claim that to be useful for other researchers, Fast Methods need:

  • guaranteed, adjustable error bounds: users

can set the error bound low during development stage, then experiment once they know their code works.

  • no parameters that need to be adjusted by

users (other than error tolerance).

  • documented error behaviour: we must explain

the properties of our approximation errors.

Fast N-Body Learning - Empirical Comparisons – p. 3

slide-4
SLIDE 4

Testing Framework

We tested: Sum-Kernel: fj =

N

  • i=1

wi exp

  • −xi − yj2

2

h2

  • Max-Kernel:

x∗

j = N

argmax

i=1

  • wi exp
  • −xi − yj2

2

h2 Gaussian kernel, fixed bandwidth h, non-negative weights wi, j = 1 . . . N.

Fast N-Body Learning - Empirical Comparisons – p. 4

slide-5
SLIDE 5

Testing Framework (2)

For the Sum-Kernel problem, we allow a given error tolerance ǫ: |fj − ftrue| ≤ ǫ for each j. We tested:

  • Fast Gauss Transform (FGT)
  • Improved Fast Gauss Transform (IFGT)
  • Dual-Tree with kd-tree (KDtree)
  • Dual-Tree with ball-tree constructed via

Anchors Hierarchy (Anchors)

Fast N-Body Learning - Empirical Comparisons – p. 5

slide-6
SLIDE 6

Methods Tested

Fast Gauss Transform (FGT) code by Firas Hamze of UBC. KDtree and Anchors Dual-Tree code by Dustin. The same Dual-Tree code was used for KDtree and Anchors.

Fast N-Body Learning - Empirical Comparisons – p. 6

slide-7
SLIDE 7

Methods Tested (2)

Ramani Duraiswami and Changjiang Yang generously gave their code for the Improved Fast Gauss Transform (IFGT). To make the IFGT fit in our testing framework, we had to devise a method for choosing parameters. Our method seems reasonable but is probably not optimal. All methods: in C with Matlab bindings.

Fast N-Body Learning - Empirical Comparisons – p. 7

slide-8
SLIDE 8

Results (1): A Worst-Case Scenario

Uniformly distributed points, uniformly distributed weights, 3 dimensions, large bandwidth h = 0.1, ǫ = 10−6: Time.

  • Naive is usually

fastest.

  • Only FGT is faster -

but only ∼ 3×.

  • IFGT may become

faster

  • after

1.5 hours of compute time.

10

2

10

3

10

4

10

5

10

−2

10 10

2

10

4

N CPU Time (s) Naive FGT IFGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 8

slide-9
SLIDE 9

Results (1): A Worst-Case Scenario

Uniformly distributed points, uniformly distributed weights, 3 dimensions, large bandwidth h = 0.1, ǫ = 10−6: Memory.

  • Dual-Tree memory

requirements are an issue.

10

2

10

3

10

4

10

5

10

6

10

7

10

8

10

9

N Memory Usage (bytes) FGT IFGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 8

slide-10
SLIDE 10

Results (2)

Uniformly distributed points, uniformly distributed weights, 3 dimensions, smaller bandwidth h = 0.01, ǫ = 10−6.

  • IFGT cannot be

run– more than 1010 expansion terms required for N = 100 points.

  • Dual-Tree and FGT

are fast, but not O(N).

10

2

10

3

10

4

10

5

10

−2

10

−1

10 10

1

10

2

N CPU Time (s) Naive FGT Anchors KDtree Order N*sqrt(N) Order N

Fast N-Body Learning - Empirical Comparisons – p. 9

slide-11
SLIDE 11

Results (2)

Uniformly distributed points, uniformly distributed weights, 3 dimensions, smaller bandwidth h = 0.01, ǫ = 10−6.

  • Memory

require- ments are still an issue.

10

2

10

3

10

4

10

5

10

6

10

7

10

8

10

9

N Memory Usage (bytes) FGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 9

slide-12
SLIDE 12

Results (3)

Uniform data and weights, N = 10,000, ǫ = 10−3, h = 0.01, varying dimension: CPU time.

  • IFGT very fast for

1D, infeasible beyond 2D.

  • KDtree,

Anchors show (unex- pected?)

  • ptimal

behaviour around 3

  • r 4 dimensions.

10 10

1

10

2

10

−1

10 10

1

10

2

10

3

Dimension CPU Time (s) Naive FGT IFGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 10

slide-13
SLIDE 13

Results (3)

Uniform data and weights, N = 10,000, ǫ = 10−3, h = 0.01, varying dimension: Memory usage.

10 10

1

10

2

10

6

10

7

10

8

10

9

Dimension Memory Usage (bytes) Naive FGT IFGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 10

slide-14
SLIDE 14

Results (4)

Uniform sources, uniform targets, N = 10,000, h = 0.01, D = 3, ǫ = 10−6: CPU time.

  • Cost of Dual-Tree

methods increases slowly with accuracy.

  • FGT

cost rises more quickly.

10

−11

10

−9

10

−7

10

−5

10

−3

10

−1

10 10

1

Epsilon CPU Time Naive FGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 11

slide-15
SLIDE 15

Results (4)

Uniform sources, uniform targets, N = 10,000, h = 0.01, D = 3, ǫ = 10−6: CPU time relative to Uniform.

  • Error of Dual-Tree

methods almost exactly as large as allowed (ǫ).

  • FGT (and presum-

ably IFGT) overes- timate the error– thus do more work than required.

10

−10

10

−5

10

−10

10

−5

Epsilon Real Error FGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 11

slide-16
SLIDE 16

Clumpy Data

Uniform data is a worst-case scenario for these methods. Next: clumpy data! Clumpiness = 1.0

Fast N-Body Learning - Empirical Comparisons – p. 12

slide-17
SLIDE 17

Clumpy Data

Uniform data is a worst-case scenario for these methods. Next: clumpy data! Clumpiness = 1.1

Fast N-Body Learning - Empirical Comparisons – p. 12

slide-18
SLIDE 18

Clumpy Data

Uniform data is a worst-case scenario for these methods. Next: clumpy data! Clumpiness = 1.2

Fast N-Body Learning - Empirical Comparisons – p. 12

slide-19
SLIDE 19

Clumpy Data

Uniform data is a worst-case scenario for these methods. Next: clumpy data! Clumpiness = 1.3

Fast N-Body Learning - Empirical Comparisons – p. 12

slide-20
SLIDE 20

Clumpy Data

Uniform data is a worst-case scenario for these methods. Next: clumpy data! Clumpiness = 1.5

Fast N-Body Learning - Empirical Comparisons – p. 12

slide-21
SLIDE 21

Clumpy Data

Uniform data is a worst-case scenario for these methods. Next: clumpy data! Clumpiness = 2.0

Fast N-Body Learning - Empirical Comparisons – p. 12

slide-22
SLIDE 22

Clumpy Data

Uniform data is a worst-case scenario for these methods. Next: clumpy data! Clumpiness = 3.0

Fast N-Body Learning - Empirical Comparisons – p. 12

slide-23
SLIDE 23

Results (5): clumpy sources

Clumpy sources, uniform targets, N = 10,000, h = 0.01, D = 3, ǫ = 10−6, varying clumpiness: CPU time. As clumpiness increases, Dual-Tree methods get faster.

1 1.5 2 2.5 3 10 10

1

Data Clumpiness CPU Time Naive FGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 13

slide-24
SLIDE 24

Results (5): clumpy sources

Clumpy sources, uniform targets, N = 10,000, h = 0.01, D = 3, ǫ = 10−6, varying clumpiness: CPU time relative to Uniform. Especially Anchors.

1 1.5 2 2.5 3 0.5 0.6 0.7 0.8 0.9 1 Data Clumpiness CPU Usage Relative to Uniform Data Naive FGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 13

slide-25
SLIDE 25

Results (6): clumpy sources and targets

Clumpy sources, clumpy targets, N = 10,000, h = 0.01, D = 3, ǫ = 10−6, varying clumpiness: CPU time. Even bigger improve- ments!

1 1.5 2 2.5 3 10 10

1

Data Clumpiness CPU Time Naive FGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 14

slide-26
SLIDE 26

Results (6): clumpy sources and targets

Clumpy sources, clumpy targets, N = 10,000, h = 0.01, D = 3, ǫ = 10−6, varying clumpiness: CPU time relative to Uniform. Large variance- details

  • f

particular clumpy data sets?

1 1.5 2 2.5 3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Data Clumpiness CPU Usage Relative to Uniform Data Naive FGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 14

slide-27
SLIDE 27

Results (7): clumpy, dimensionality

Clumpy sources and targets (C = 2), N = 10,000, h = 0.01, ǫ = 10−3, varying dimension: CPU time. Not qualitatively differ- ent from uniform data!

10 10

1

10 10

1

10

2

Dimension CPU Time (s) Naive IFGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 15

slide-28
SLIDE 28

Results (7): clumpy, dimensionality

Clumpy sources and targets (C = 2), N = 10,000, h = 0.01, ǫ = 10−3, varying dimension: CPU time. For reference: the non- clumpy results.

10 10

1

10

2

10

−1

10 10

1

10

2

10

3

Dimension CPU Time (s) Naive FGT IFGT Anchors KDtree

Fast N-Body Learning - Empirical Comparisons – p. 15

slide-29
SLIDE 29

Summary (1)

  • Synthetic-data tests; each algorithm is required to

guarantee results within a given error tolerance.

  • IFGT:
  • We devised a method of choosing parameters– a

different method might work better.

  • The error bounds seem to be very loose, so it does

much more work than necessary.

Fast N-Body Learning - Empirical Comparisons – p. 16

slide-30
SLIDE 30

Summary (2)

Dual-Tree:

  • Work well when either the kernel is highly local (small

bandwidth) or when the data has strong structure.

  • Work well across a wide range of error tolerances–

give errors that are close to the estimate.

  • Memory requirements are an issue (some heuristics

could be used).

  • In these tests, Anchors Hierarchy doesn’t outperform

KDtree, though it improves significantly with clumpiness.

Fast N-Body Learning - Empirical Comparisons – p. 17

slide-31
SLIDE 31

And Now For Something Slightly Different: Max-Kernel

Fast N-Body Learning - Empirical Comparisons – p. 18

slide-32
SLIDE 32

The Problem

  • Given:
  • N target points (yj)
  • M source points (xi) with weights wi
  • Compute, for each yj:

f MAX

j

= max

i

wiK(xi, yj)

  • Cost: O(MN)

Fast N-Body Learning - Empirical Comparisons – p. 19

slide-33
SLIDE 33

The Problem

  • Given:
  • N target points (yj)
  • M source points (xi) with weights wi
  • Compute, for each yj:

f MAX

j

= max

i

wiK(xi, yj)

  • Cost: O(MN)
  • Applications:
  • maximum a-posteriori belief propagation
  • Viterbi algorithm for chains
  • (MAP) particle smoothing

Fast N-Body Learning - Empirical Comparisons – p. 19

slide-34
SLIDE 34

The Methods

  • 1. Distance Transform

Fast N-Body Learning - Empirical Comparisons – p. 20

slide-35
SLIDE 35

The Methods

  • 1. Distance Transform
  • as previously presented
  • can be extended to handle Monte Carlo grids in 1D
  • increases cost to O(M log M + N log N)

Fast N-Body Learning - Empirical Comparisons – p. 20

slide-36
SLIDE 36

The Methods

  • 1. Distance Transform
  • as previously presented
  • can be extended to handle Monte Carlo grids in 1D
  • increases cost to O(M log M + N log N)
  • 2. Dual-tree algorithm

Fast N-Body Learning - Empirical Comparisons – p. 20

slide-37
SLIDE 37

The Methods

  • 1. Distance Transform
  • as previously presented
  • can be extended to handle Monte Carlo grids in 1D
  • increases cost to O(M log M + N log N)
  • 2. Dual-tree algorithm
  • “bound and prune”

recursion

  • details:

Klaas, Lang, de Freitas. “Fast maximum a-posteriori inference in Monte Carlo state spaces”. AISTATS 2005 (to appear).

Fast N-Body Learning - Empirical Comparisons – p. 20

slide-38
SLIDE 38

1D time series

  • MAP particle smoothing
  • Non-linear, multi-modal time series
  • Note log-log scale
  • Both beat naïve by
  • rders of magnitude
  • Dist. trans. 2-3× faster

than dual-tree

  • Similar asymptotic

growth

  • Clearly,

dist. trans. should be used when possible!

10

2

10

3

10

4

10

−1

10 10

1

10

2

10

3

Particles Time (s) naive dual−tree

  • dist. transform

Fast N-Body Learning - Empirical Comparisons – p. 21

slide-39
SLIDE 39

Applied example: beat-tracking

  • Particle-filter based beat tracker
  • MAP smoothing on a 3D Monte-Carlo state space
  • distance transform cannot be used
  • Dual-tree is faster after

10ms compute time

  • Dual-tree exhibits

asymptotic O(N log N) growth

  • Takes

seconds rather than days to process a song.

10

2

10

3

10

4

10

−3

10

−2

10

−1

10 10

1

10

2

Particles Time (s) naive dual−tree

Fast N-Body Learning - Empirical Comparisons – p. 22

slide-40
SLIDE 40

Other factors: dimensionality

  • The behaviour of dual-tree algorithms as N grows is

well-understood

  • What about other factors?

Fast N-Body Learning - Empirical Comparisons – p. 23

slide-41
SLIDE 41

Other factors: dimensionality

  • The behaviour of dual-tree algorithms as N grows is

well-understood

  • What about other factors?
  • Synthetic test:
  • 20,000 data points (fixed)
  • Gaussian kernel with fixed bandwidth
  • distribution: uniform, clustered
  • clustered data formed by drawing from k Gaussians
  • k = 4 (dash), 20 (dash-dot), 100 (dotted) uniform

(solid)

  • kd-trees (red) vs. metric trees (green)

Fast N-Body Learning - Empirical Comparisons – p. 23

slide-42
SLIDE 42

Dimensionality (cont.)

  • Two examples: distance computations (L); time (R)
  • Dual-tree methods can be slower than naïve, and this

is due to inherent complexity, not just high constants.

  • ie., it uses O(N 2) distance computations.

1 10 40 10

6

10

7

10

8

Dimensions (k = 20) naive anchors kd−tree 1 10 40 10

−1

10 10

1

10

2

Dimensions (k = 100) naive anchors kd−tree

Fast N-Body Learning - Empirical Comparisons – p. 24

slide-43
SLIDE 43

Dimensionality (relative)

1 10 40 1

kd−tree = 1 uniform

anchors kd−tree 1 10 40 1

k=4

anchors kd−tree 1 10 40 1

Relative time (s) k=20

anchors kd−tree 1 10 40 1

k=100

anchors kd−tree

  • Clustering is necessary for metric trees to be effective.

Fast N-Body Learning - Empirical Comparisons – p. 25

slide-44
SLIDE 44

Summary

  • Distance transform and dual-tree methods are fast

Fast N-Body Learning - Empirical Comparisons – p. 26

slide-45
SLIDE 45

Summary

  • Distance transform and dual-tree methods are fast
  • ...but dual-tree has more overhead.

Fast N-Body Learning - Empirical Comparisons – p. 26

slide-46
SLIDE 46

Summary

  • Distance transform and dual-tree methods are fast
  • ...but dual-tree has more overhead.
  • Use the distance transform when:
  • kernel is e−x−y2 or e−x−y (or others?)
  • data is one dimensional, or lies on a regular grid.

Fast N-Body Learning - Empirical Comparisons – p. 26

slide-47
SLIDE 47

Summary

  • Distance transform and dual-tree methods are fast
  • ...but dual-tree has more overhead.
  • Use the distance transform when:
  • kernel is e−x−y2 or e−x−y (or others?)
  • data is one dimensional, or lies on a regular grid.
  • Although we focus on performance as N grows, it is

the “constants” that really matter

  • these are determined by the data distribution, the

kernel, and the spatial index.

  • huge potential for future investigation.

Fast N-Body Learning - Empirical Comparisons – p. 26

slide-48
SLIDE 48

Thanks! Time for Questions!

Fast N-Body Learning - Empirical Comparisons – p. 27

slide-49
SLIDE 49

Q&A

  • Clumpy Data generation
  • Choosing IFGT params

Fast N-Body Learning - Empirical Comparisons – p. 28

slide-50
SLIDE 50

Clumpy Data (back)

We generate clumpy data with clumpiness C by recursively distributing points into sub-boxes such that the occupancies satisfy:

n

  • i=1

Ni = N

var ({Ni}) = (C − 1) mean (Ni)2

This describes the width of the distribution of ‘mass’ among boxes. Recurse until N ≤ 10.

Fast N-Body Learning - Empirical Comparisons – p. 29

slide-51
SLIDE 51

Choosing IFGT Parameters (back)

K : number of source clusters ry : influence radius of clusters p : number of expansion terms We choose a maximum number of clusters K∗. The complexity is NK, so to be O(N), K∗ must be a constant. In these tests, we instead set K∗ = √ N, since we tested across orders of magnitude.

Fast N-Body Learning - Empirical Comparisons – p. 30

slide-52
SLIDE 52

Choosing IFGT Parameters (2)

Four constraints: C1 : outside-of-influence-radius error EC ≤ ǫ C2 : truncation error ET ≤ ǫ C3 : K ≤ K∗ C4 : rxry h2

  • ≤ 1

the first three are hard, the fourth is soft (helps convergence). (Each source point contributes to error through either EC

  • r ET)

Fast N-Body Learning - Empirical Comparisons – p. 31

slide-53
SLIDE 53

Choosing IFGT Parameters (3)

for k = 1 to K∗: run k-centers algorithm. find largest cluster radius rx. using ry = ry(ideal), compute C1, C4. if C1 AND C4 satisfied: break if k < K∗: // C4 can be satisfied. set ry = min(ry) such that C1 AND C4. else: // C4 cannot be satisfied. set ry = min(ry) such that C1. set p = min(p) such that C2.

Fast N-Body Learning - Empirical Comparisons – p. 32