Multi- -dimensional Data and dimensional Data and Spatial Range - - PowerPoint PPT Presentation

multi dimensional data and dimensional data and spatial
SMART_READER_LITE
LIVE PREVIEW

Multi- -dimensional Data and dimensional Data and Spatial Range - - PowerPoint PPT Presentation

Multi- -dimensional Data and dimensional Data and Spatial Range Spatial Range Multi Query in Sensor Networks Query in Sensor Networks Jie Gao Computer Science Department Stony Brook University 1 Papers Papers [Li03a] X. Li, Y. J.


slide-1
SLIDE 1

1

Multi Multi-

  • dimensional Data and

dimensional Data and Spatial Range Spatial Range Query in Sensor Networks Query in Sensor Networks

Jie Gao

Computer Science Department Stony Brook University

slide-2
SLIDE 2

2

Papers Papers

  • [Li03a] X. Li, Y. J. Kim, R. Govindan, W. Hong, Multi-

dimensional Range Queries in Sensor Networks, Proc. ACM SenSys 2003.

  • [Gao04] J. Gao, L. Guibas, J. Hershberger, L. Zhang,

Fractional Cascaded information in a sensor network, IPSN’04.

slide-3
SLIDE 3

3

Orthogonal range search Orthogonal range search

  • Find all the sensors inside a rectangular box.
  • Find all the sensors with temperature readings

above 70F.

slide-4
SLIDE 4

4

Multi Multi-

  • dimensional data

dimensional data

  • Monitor environments.
  • Multiple sensors, multiple attributes.
  • Query might be multi-dimensional as well.

List all sensors with temperature value 70-80 and light level 10-20.

slide-5
SLIDE 5

5

Sensor network as a database Sensor network as a database

  • Need an indexing scheme.
  • …. In addition, a storage scheme.
  • First we look at range query in a

centralized setting.

slide-6
SLIDE 6

6

1D range search 1D range search

  • Find the data inside a query interval [x, x’]
  • 1D range tree: a balanced partitioning tree on a

sorted list.

– Each leaf stores an input value. – Each internal node stores the splitting value.

3 10 19 23 30 3 19 37 49 59 30 49 10 37 23

slide-7
SLIDE 7

7

1D range search 1D range search

  • Find the data inside a query interval [x, x’]

– Start from the root and descend the tree to find the interval where x and x’ stays. – Include all the leaves in the sub-trees between the two traversing paths from the root.

  • Example [9, 33].

3 10 19 23 30 3 19 37 49 59 30 49 10 37 23

slide-8
SLIDE 8

8

1D range search 1D range search

  • Storage: n+n/2+n/4+…+1=2n=O(n)
  • Height of the tree: O(logn)
  • Query time: O(logn+k), where k is the output size.

3 10 19 23 30 3 19 37 49 59 30 49 10 37 23

slide-9
SLIDE 9

9

Kd Kd-

  • tree

tree

  • A recursive space partitioning tree.

– Partition along x and y axis in an alternating fashion. – Each internal node stores the splitting node along x (or y).

x x y y x

slide-10
SLIDE 10

10

Kd Kd-

  • tree

tree

  • 2D query R=[x, x’]×[y, y’].

– Check with each internal node whether the cutting line intersects R.

  • If yes, recurse on both.
  • If no, only recurse on the half plane that intersects R.

x x y y x

slide-11
SLIDE 11

11

Kd Kd-

  • tree

tree

  • Storage: O(n)
  • Height of the tree: O(logn)
  • Query cost? O(n1/2+k), where k is the output size.
slide-12
SLIDE 12

12

r(v)

Kd Kd-

  • tree

tree

  • Query cost? O(n1/2+k), where k is the output size.
  • Intuition: we visit 2 types of nodes:

– r(v) is fully contained in R (this is counted in k). – r(v) is not fully contained in R – intersected by boundaries of R.

  • Thus we bound the number of nodes intersected by a vertical

line, denoted by Q(n).

slide-13
SLIDE 13

13

Kd Kd-

  • tree

tree

  • Thus we bound the number of nodes intersected by a vertical

line, denoted by Q(n).

  • Look at the 4 grandchildren, the line intersects at most 2 of

them.

  • Thus Q(n)=2Q(n/4)+O(1)= O(n1/2).
  • The query cost is O(k)+4Q(n)= O(n1/2+k).
slide-14
SLIDE 14

14

Kd Kd-

  • tree in R

tree in Rd

d

  • High dimensional kd-tree.
  • If the dimension is d, we can build a kd-tree with

O(n) size, and query cost O(n1-1/d+k), where k is the

  • utput size.
  • Query cost is too high.
  • We can get it down if we sacrifice on space.
  • Range tree: O(nlogd-1n) space and O(logdn+k)

query cost.

slide-15
SLIDE 15

15

Range tree Range tree

  • Recall the 1d range tree.
  • 2D range tree:

– First build a 1D range tree on x-coordinates – For each internal node, take all the nodes in its subtree, build a 1D range tree on y-coordinates.

  • Total space: O(nlogn)

Range tree on x-corodinates Range tree on y-corodinates

slide-16
SLIDE 16

16

Range tree Range tree

  • Query:

– First search the 1D range tree on the x-coordinates – For each node on the traversal path, search on the y- coordinates.

  • Query cost: O(log2n+k)

Range tree on x-corodinates Range tree on y-corodinates

slide-17
SLIDE 17

17

Quad Quad-

  • tree

tree

  • A recursive space partitioning tree.
  • The depth might be as high as Ω(n).
  • Worst-case query cost is not bounded. For uniform

sensor distribution the depth is O(logn).

slide-18
SLIDE 18

18

Indexing in a sensor network? Indexing in a sensor network?

  • Where is the index stored?
  • How to traverse the tree?
  • 1st approach: map a quad-tree to the

sensor field.

  • 2nd approach: distributed storage and

indexing.

slide-19
SLIDE 19

19

DIMENSIONS: summaries DIMENSIONS: summaries

  • Use a quad-tree partitioning.
slide-20
SLIDE 20

20

DIMENSIONS: query DIMENSIONS: query

  • Top-down query processing
slide-21
SLIDE 21

21

Issues with Issues with DIMENSIONs DIMENSIONs

  • Uneven load: nodes holding coarse data

are visited more often.

  • Root becomes traffic bottleneck.
slide-22
SLIDE 22

22

Distributed index for multi Distributed index for multi-

  • dimensional data

dimensional data

  • Construct the distributed indices.
  • Locality preserving geographic hash: events

with close attributes values are likely to be stored close.

  • Kd-tree partitioning.
slide-23
SLIDE 23

23

Zones Zones

  • The sensor network is partitioned to equal (geographical) size

regions along x and y directions alternatively.

  • Each cell is given a zone code – left (bottom) is 0, right (top)

is 1.

slide-24
SLIDE 24

24

Zone Zone-

  • tree

tree

  • Each node x owns a zone – the largest one that contains x
  • nly.
  • If a zone is empty, it is owned by the backup node – the

rightmost zone in the left sibling tree, or the leftmost zone in the right sibling tree.

slide-25
SLIDE 25

25

Data Data-

  • centric hashing

centric hashing

  • Hash a multi-dimensional event to a zone.
  • A multi-dimensional event {Ai}, i=1, …, m, Ai ∈[0, 1].
  • Suppose the zone code has k bits, k is a multiple of m.
  • For i=1 to m, if Ai<0.5, the i-th bit is assigned 0, otherwise 1.
  • For i=m+1 to 2m, if Ai-m<0.25 or 0.5 ≤ Ai-m<0.75, the i-th bit is

assigned 0, otherwise 1. For example: [0.3, 0.8] is stored at 5- bit zone code 01110. The event is hashed to the node that

  • wns the zone.

A1<0.5 A1<0.5, A2<0.5 A1<0.25 or 0.5 ≤ A1<0.75, A2<0.5

slide-26
SLIDE 26

26

Data Data-

  • centric routing

centric routing

  • The encoding node (where the event E is

generated) may not know the # bits of the hashed zone.

  • Node A encodes the node by using the length of its
  • wn code and generates the zone code c(E).
  • Node A routes by GPSR to the centroid of the zone

c(E).

  • Intermediate nodes may refine code c(E).
  • If the current node B finds a match of its own code

and the event code c(E), then B stores the event.

slide-27
SLIDE 27

27

Routing queries Routing queries

  • Looking for a point event is the same as routing an

event.

  • A range query is routed to a zone corresponding to

the entire range, and then progressively split into smaller sub-queries.

slide-28
SLIDE 28

28

Event routing helps resolving undecided zones Event routing helps resolving undecided zones

  • How does each node knows

its own zone code?

  • Assume that every node

knows the outer boundary.

  • A node checks its 1-hop

neighbors and decides on the largest zone that only contains itself.

  • This may not fully resolve all

the boundaries.

slide-29
SLIDE 29

29

Event routing helps resolving undecided zones Event routing helps resolving undecided zones

  • A claims the ownership of event E.
  • But A is not sure of its upper boundary. So A sends
  • ut the event E by GPSR (face routing) with a

destination near A.

  • Node B that receives this message shrink its zone.
slide-30
SLIDE 30

30

DIM summary DIM summary

  • Data storage explores query locality. Range query

can be supported.

  • Events are not necessarily stored close to where

they are generated.

  • Each event costs about O(n1/2) communication

cost.

  • When data is highly skewed, most data are

handled by a small number of sensors which become bottleneck.

slide-31
SLIDE 31

31

Major problem: data storage Major problem: data storage

  • Similar data (in attribute space) should be

stored close.

  • Data should be stored close to where they

were generated. --- location is an important attribute of the data.

  • The two considerations may be in conflict.
slide-32
SLIDE 32

32

Fractional cascading in sensor network Fractional cascading in sensor network

  • Geographical range query (q, R, T): q is where the

query is generated, R is the rectangular range, T is a temperature range or other aggregates.

  • Aggregates about region R should be returned to

query node. q R

slide-33
SLIDE 33

33

Storage scheme Storage scheme

  • The aggregated value of a quad node is stored in

all the sensors in the parent subtree.

  • Each node stores O(logn) data.
  • Construction: bottom up. Cost O(n logn).
slide-34
SLIDE 34

34

Query scheme Query scheme

  • The query region R is partitioned into canonical

regions – the maximal quads completely inside R.

  • Use a spiral routing to visit a sensor in each

canonical regions.

  • Recurse on each canonical piece.
slide-35
SLIDE 35

35

Query cost Query cost

  • The query cost for (q, R, [T, ∞)) is
  • A is the area, P is the perimeter, k is the output size.
  • Cost 1: spiral visit: O(PlogP)
slide-36
SLIDE 36

36

Query cost Query cost

  • Cost 2: the communication cost of recursion in each canonical

piece with side length L(u) and output k(u) is

  • The total recursion cost is
slide-37
SLIDE 37

37

Summary Summary

  • Store similar data close

– Work in the space of the data field – Bring all similar data together – May need to travel far

  • Store data nearby

– Respect space locality for geographical range query. – Communication cost is low. – Range search in data space is challenging.

  • Can you get the best of both worlds?
slide-38
SLIDE 38

38

The remaining classes The remaining classes

  • Coding theory with applications in routing

(network coding) and storage.

  • Percolation theory and connectivity.
  • Synchronization.
  • Gossip algorithms.
  • Anything you want to suggest?
  • Reminder: work on your project! Come to

discuss with me if you want guidance and suggestions.

slide-39
SLIDE 39

39

Lower bound on query cost Lower bound on query cost

  • Assume sensors are on a regular grid with n sensors. Each

sensor has a value 0 or 1. Now we want to report “hot” sensors in a range R. Assume each sensor stores m=polylogn data. Type I query: the range is a single sensor r, (q, r). # sensors in Q1: D2 # storage in Q2: at most D2 Thus no matter how we store data in the network, a type I query has to go outside Q2 to look for the data. The query cost is

slide-40
SLIDE 40

40

Lower bound on query cost Lower bound on query cost

  • Type II query (q, R(q, r)).
  • Suppose t1 and t2 are two different assignments of values in

the region R(q, r), I.e., at least one sensor has different value. Suppose R(q, r) has area A = # sensors inside R. There are total 2A different assignments. We need at least A storage to different two different assignments. # sensors in Q3: A Thus a type II query has to go

  • utside Q3 to look for the data.

The query cost is