Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, - - PowerPoint PPT Presentation

approximate nearest neighbors
SMART_READER_LITE
LIVE PREVIEW

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, - - PowerPoint PPT Presentation

Approximate Nearest Neighbors Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions Approximate Nearest Neighbors What we want O(n log n)


slide-1
SLIDE 1

Approximate Nearest Neighbors

Sariel Har Peled: Notes Arya, Mount, Netenyahu, Silverman, Wu An Optimal Algorithm for Approximate Nearest Neighbor Searching in Fixed Dimensions

slide-2
SLIDE 2

Approximate Nearest Neighbors

  • What we want

– O(n log n) preprocess – O(n) space – O(log n) time query

  • Possible in 1 and 2D
  • Not really in 3D
slide-3
SLIDE 3

Lets Approximate

  • Return a point within distance (1+ε)r
  • Can achieve the bounds several ways
  • First

– compute rough approximation – use it to set scale for final solution

  • Second

– build a tree which solves the problem

slide-4
SLIDE 4

Ring Separator Tree

i n

  • u

t

  • ut

i n in

  • ut
  • ut

in

  • ut

in

  • ut

in

  • ut

in

slide-5
SLIDE 5

Ring Separator Tree

  • Answer (1+4/t)-ANN queries in O(height)
  • Check if rep is closest, if so update closest
  • Recurse on correct side of halfway ball
slide-6
SLIDE 6

Error Bounds

  • Closest: rt/2
  • Returned: 2r+rt/2
slide-7
SLIDE 7

Construction

  • Find circle containing

n/c points

slide-8
SLIDE 8

Construction

  • Grid of side
  • Number of points
  • Set
  • Ring has n/2 points

L= r 16d

4L

dn

c

c=24L

d

slide-9
SLIDE 9

Construction

  • Put ring in largest

gap

  • Size 2r/n
slide-10
SLIDE 10

The Upshot

  • Can preprocess in O(n log n) time
  • Query time is O(log n)
  • (4n+1) approximation!
  • Amazingly, this is good enough
slide-11
SLIDE 11

Bounded Distance

  • Normal quadtree gives
  • Why?

– Approximation and r eliminates small cells (ε/4)r – Bound number of cells visited by last level – Do some algebra to get bound...

O 1 

dlog

slide-12
SLIDE 12

A Complete Algorithm

  • Build

– a compressed quadtree/finger tree – a ring separator tree

  • Compute approximate value, R
  • Start from

– nodes of size approximately R – and closer than R to query point

slide-13
SLIDE 13

Arya and Mount

  • O(dn log n) time
  • O(dn) space
  • O(cd,ε log n) time ANN

– where cd,ε ≤ d(1+6d/ε)d

  • Can find k NN
  • Any Minkowski metric
  • Preprocessing does not depend on ε or metric
slide-14
SLIDE 14

Overview

  • Build BBD tree
  • Locate leaf containing q
  • Try nearby nodes in order of distance
  • Stop when no node is close enough
slide-15
SLIDE 15

Tree types

  • KD reduce number of points each level
  • Quadtree reduces size
  • BBD does both

– either KD-like split – or shrink

slide-16
SLIDE 16

Properties

  • Bounded aspect ratio

– bound number of cells intersecting a volume

  • Stickiness

– control number of nearby cells

  • Inner boxes not cut by children

– so everything packs

slide-17
SLIDE 17

An Important Trick

  • Maintain 3 sorted lists of points (x,y,z)
  • Have links between lists
  • Allows

– removal of first k points in time k – O(d) time determination of min bounding box

slide-18
SLIDE 18

Computing Shrinks

  • Compute a set of splits

– until have n/c in a rectangle – trivially sticky

  • Problems

– doesn't respect nesting – may have to split many times

slide-19
SLIDE 19

Computing Shrinks II

  • Alway cut min enclosing box

– constant time – always remove points – make sure it respects stickyness

  • Include parent inner rectangle

– go until it is cut out

slide-20
SLIDE 20

Computing Shrinks 2

  • More flexible
  • Shrink roughly as before
slide-21
SLIDE 21

Tweaks

  • Collapse trivial splits/shrinks

– now no sequence of trivial splits

  • Assign one point to each leaf

– even to empty shrink cells

slide-22
SLIDE 22

Properties

  • Bounded occupancy
  • Point near each leaf
  • Can do point location in O(d log n) time
  • Packing constraint
  • Distance enumeration
slide-23
SLIDE 23

Proof of Packing

  • Ball of radius r

– intersects (1+6r/s)d leaves of size s

  • Trivial packing argument except for shrinks

– use stickiness to replace outer boxes

slide-24
SLIDE 24

ANN using BBD

  • Number of leaves visited is O((1+6d/ε)d)
  • r is distance to last non-terminating leaf
  • r(1+ε)≤dist(q,p)
  • Can't have visited cell smaller than rε/d

– this cell must have a point closer than r(1+ε)

  • Use packing argument from before
slide-25
SLIDE 25

Experimental Results

  • Choices

– shrink only when necessary – leaves held 5-8 points

  • Results

– Slightly slower than Kd trees for even data – Much faster for clustered data (10x or so) – Slightly slower than Kd trees for surfaces (20%)

10 1 .1 .01 .001 2.5 5 7.5 10 12.5 15 17.5 20 22.5

Surface Data

BBD Kd