Data Structures for Geometric Intersection Query Problems Saladi - - PowerPoint PPT Presentation

data structures for geometric intersection query problems
SMART_READER_LITE
LIVE PREVIEW

Data Structures for Geometric Intersection Query Problems Saladi - - PowerPoint PPT Presentation

Data Structures for Geometric Intersection Query Problems Saladi Rahul Advisor: Prof. Ravi Janardan Doctoral Candidate, Dept. of Computer Science & Engg., University of Minnesota Twin-Cities July 13, 2017 Range Searching Salary


slide-1
SLIDE 1

Data Structures for Geometric Intersection Query Problems

Saladi Rahul Advisor: Prof. Ravi Janardan

Doctoral Candidate,

  • Dept. of Computer Science & Engg., University of Minnesota Twin-Cities

July 13, 2017

slide-2
SLIDE 2

Range Searching

Performance Measures

  • 1. Size of the data structure
  • 2. Query time
  • 3. Update time
  • 4. Preprocessing time

Salary Age

q

30 40 30,000 50,000

slide-3
SLIDE 3

Landscape of Geometric Intersection Queries (GIQ)

slide-4
SLIDE 4

(1) Geometric Settings

(a) orthogonal range search (b) circular range search (c) halfspace range search (d) dominance range search (e) rectangle stabbing (f) segment intersection q q q q q q

slide-5
SLIDE 5

(2) Aggregation Function

reporting, counting. max, top-k, sum. convex hull, skyline. minimum spanning tree. closest pair. color (or group-by).

slide-6
SLIDE 6

(3) Fundamental Structures and Techniques

Balanced partition of objects. priority search tree, range trees, interval tree, segment tree, B-tree, R-tree, Kd-tree. More Sophisticated Tools. persistence, filtering search, fractional cascading. Randomization and Approximation Tools. ε-sample, ε-nets, moments technique. Integer Data. Van Emde Boas tree, fusion tree, FindAny structure Recent Discoveries. Buffer Trees, stronger version of filtering search, shallow cuttings for orthogonal problems. Very High Dimensional Space. Matrix multiplication, . . . new ideas needed

slide-7
SLIDE 7

Philosophy of our research

slide-8
SLIDE 8

Design of geometric algorithms & data structures and their formal mathematical analysis.

slide-9
SLIDE 9

Quest for optimality...

How far can you push the space & query time bounds? (Curse of dimensionality) 1D vs 2D vs 3D vs ...

slide-10
SLIDE 10

Scope of the thesis

slide-11
SLIDE 11

GIQ

Point Location in 3D Approximate Counting Rectangle Stabbing in 3D Top-K

approx. the number

  • f
  • b-

jects/colors intersecting the query. Which box contains the query point? report the rectangles containing the query point. report the K most important objects.

slide-12
SLIDE 12

GIQ

Point Location in 3D Approximate Counting Rectangle Stabbing in 3D Top-K

approx. the number

  • f
  • b-

jects/colors intersecting the query. Which box contains the query point? report the rectangles containing the query point. report the K most important objects.

SoCG 2017 Under submission SODA 2015 TKDE’14, PODS’15, PODS’16, Manuscript

slide-13
SLIDE 13

Rectangle Stabbing

(Almost) resolved a three-decade old open problem. Saladi Rahul. Improved bounds for orthogonal point enclosure query and point location in orthogonal subdivisions in R3. SODA 2015.

slide-14
SLIDE 14

Problem

q

slide-15
SLIDE 15

Optimality in 1d and 2d

Comparison Model and Pointer Machine model: Ω(log n + k) Query Time: O(log n + k) Space: O(n) Space: O(n) Query Time: O(log n + k) 1d 2d q

slide-16
SLIDE 16

Rectangle stabbing in 3d

BIG (THEORETICAL) GAP!

O(n) Ω(log2 n + k) Lower Bound Afshani, Arge, and Larsen [SoCG’10, SoCG’12] O(n) O(log4 n + k) State of the art

slide-17
SLIDE 17

Almost Optimal Result in 3d

BIG GAP!

O(n) O(log4 n + k) State of the art O(n) Ω(log2 n + k) Lower Bound Afshani, Arge, and Larsen [SoCG’10, SoCG’12]

GAP ALMOST CLOSED

Our Result O(n log∗ n) space O(log2 n · log log n + k)

slide-18
SLIDE 18

Orthogonal Point Location

(Designed the first optimal solution in 3D) Under Submission.

slide-19
SLIDE 19

Problem in 2D q

slide-20
SLIDE 20

Problem in 3D

Figure shown in 2D for convenience

q

slide-21
SLIDE 21

History of point location in 3D

Reference Space Query Time Edelsbrunner et al. n log3 n Afshani et al. n

log2 n log log n

Rahul n log1.5 n Chan n log n log log n New n logw n Nekrich n/B log2

B n

New n/B logB n

slide-22
SLIDE 22

Top-k Geometric Intersection Queries (Top-k GIQ)

slide-23
SLIDE 23

Why Top-k?

Big Data. What happens if the database returns too many results? Reduce Cognitive Overload. “Enough Already!” [Carey and Kossmann’97]

  • Smartphones. Limited screen size.
slide-24
SLIDE 24

1D Top-k Range Search

Find the k most viewed youtube videos which were published between 1st June 2000 and 1st June 2005.

q 5M 6M 100M 22M 10M 7M 13M 99M

slide-25
SLIDE 25

Top-k Circular Range Search

Find the k best-rated nearby restaurants.

3.2 3.2 3.2 4.5 3.8 4.9 2.2 3.0 4.7 4.2 3.2 4.3

slide-26
SLIDE 26

Top-k Interval Stabbing

Report k best-rated hotels which have a vacancy on 13th Sept. 2016.

q Timeline 4.5 4.0 4.2 4.4 3.6 4.8

slide-27
SLIDE 27

Our Contributions

Specific geometric settings. Saladi Rahul and Yufei Tao. On top-k range reporting in 2d

  • space. PODS 2015.

Yakov Nekrich, Saladi Rahul and Yufei Tao. Optimal top-k planar rectangle stabbing and halfplane reporting. Manuscript. Generic reductions. Saladi Rahul and Ravi Janardan. A general technique for top-k geometric intersection query problems. IEEE TKDE 2014. Saladi Rahul and Yufei Tao. Efficient top-k indexing via general

  • reductions. PODS 2016.
slide-28
SLIDE 28

Specific Geometric Settings

Optimal worst-case solutions. Orthogonal range searching in 2D. Rectangle stabbing in 2D. Halfplane searching in 2D.

slide-29
SLIDE 29

Generic Reductions (Short and Sweet)

Short. Significantly simplify the design of top-k structures. Very little effort required.

  • Sweet. Involves interesting and non-trivial theoretical analysis.
slide-30
SLIDE 30

Techniques

slide-31
SLIDE 31

Simple Approach-I (Naive Reporting)

Report all the objects intersecting the query, i.e., A ∩ q. Find the top-k objects in A ∩ q. Inefficient if |A ∩ q| ≫ k.

3.2 3.2 3.2 4.5 3.8 4.9 2.2 3.0 4.7 4.2 3.2 4.3

slide-32
SLIDE 32

Answering a Top-k Query

Two Step Process Find the k-th largest weight in A ∩ q. Call it τ. Run a prioritized reporting query. Report objects with weight ≥ τ.

3.2 3.2 3.2 4.5 3.8 4.9 2.2 3.0 4.7 4.2 3.2 4.3

slide-33
SLIDE 33

Our Approach (R & Janardan [TKDE’14])

80 70 60 50 40 30 20 10

k = 4

v1 v3 v2 v4 v5 A(v1) = 5, k′ = 4 A(v3) = 2, k′ = 1 A(v2) = 3, k′ = 4 A(v4) = 1, k′ = 1 A(v5) = 1, k′ = 1 80 70 60 50 40 30 20 10

1) Need to answer counting queries. 2) Only O(log n) nodes are visited.

slide-34
SLIDE 34

Our Approach (R & Janardan [TKDE’14])

80 70 60 50 40 30 20 10

k = 4

v1 v3 v2 v4 v5 A(v1) = 5, k′ = 4 A(v3) = 2, k′ = 1 A(v2) = 3, k′ = 4 A(v4) = 1, k′ = 1 A(v5) = 1, k′ = 1 80 70 60 50 40 30 20 10 counting structure

slide-35
SLIDE 35

General Reduction-I

Given A prioritized structure of Spri(n) space that answers a query in Qpri(n) + O(t) time; A counting structure of Scnt(n) space that answers a query in Qcnt(n) time. Then there is a top-k structure with Stop(n) = O(Scnt(n) · log2 n + Spri(n)) Qtop(n) = O(Qcnt(n) · log2 n + Qpri(n) + k) Updates handled efficiently.

slide-36
SLIDE 36

Limitation

slide-37
SLIDE 37

Expensive Counting Structures

3.2 3.2 3.2 4.5 3.8 4.9 2.2 3.0 4.7 4.2 3.2 4.3

Space: O(n) Query time: O(√n)

slide-38
SLIDE 38

Can Other Aggregate Functions be Used to Solve Top-k GIQ?

slide-39
SLIDE 39

Another Companion Problem

Max Query: Report the object with the largest weight. Easiest special case of Top-k query. New Goal: Design a Top-k GIQ structure using the Max Structure.

slide-40
SLIDE 40

Answering a Top-k Query

Two Step Process Find the approximate k-th largest weight in A ∩ q. Call it τ. Run a prioritized reporting query. Report objects with weight ≥ τ.

3.2 3.2 3.2 4.5 3.8 4.9 2.2 3.0 4.7 4.2 3.2 4.3

slide-41
SLIDE 41

Reducing top-k to top-1 (R & Tao [PODS’16])

Let S be a set of m elements. For a (1/k)-sample set R of S The rank-1 element in R has rank in S in the range [k, 4k], with probability at least 0.09.

k 4k S success (≥ 0.09) failure (≤ 0.87) failure (≤ 0.02) failure (≤ 0.02) (1 − 1

k)4k < e−4 ≈ 0.02

slide-42
SLIDE 42

Build several Top-1 structures

If you fail, go to the next structure. Intuition.Will visit very few structures.

(1 + σ)i · log n (1 + σ)j · log n (1 + σ) · log n log n

k j

h=i(0.91)h−i · (1 + σ)h−i) ≤ k (0.99)h−i = O(k)

0.91 · (1 + σ) < 1. Pick σ = 0.09.

slide-43
SLIDE 43

General Reduction-II: NO Deterioration!

Given A max structure of Smax(n) space, Qmax(n) query time, and Umax(n) update time. A prioritized reporting structure of Spri(n) space, Qpri(n) query time, and Upri(n) update time. [R & Tao, PODS’16]: In expectation, there is an optimal top-k structure with: Stop(n) = O(Smax(n) + Spri(n)) Utop(n) = O(Umax(n) + Upri(n)) Qtop(n) = O(Qmax(n) + Qpri(n))

slide-44
SLIDE 44

Approximate Counting

Saladi Rahul. Approximate Range Counting Revisited. SoCG 2017.

slide-45
SLIDE 45

Problem-I

K= # objects intersecting the query Approximate range counting: Report a value in the range [(1 − ε)K, (1 + ε)K]

Query rectangle

slide-46
SLIDE 46

Problem-II (Enter the Colors...)

Query rectangle

K= # colors intersecting the query Colored approximate range counting: Report a value in the range [(1 − ε)K, (1 + ε)K]

slide-47
SLIDE 47

Previous Work

(1) ε-approximations Vapnik and Chervonenkis [’71] (2) Relative (p, ε)-approximations Har-Peled and Sharir [’11], Aronov and Sharir [’10], Sharir and Shaul [’11] (3) General Reductions via Sampling Aronov & Har-Peled [’08], Kaplan, Ramos and Sharir [’11] (4) Shallow Cuttings Afshani and Chan [’09], Afshani, Hamilton and Zeh [’10] (5) Word-RAM Model Chan and Wilkinson [’13], Nekrich [’14]

slide-48
SLIDE 48

Why?

slide-49
SLIDE 49

(1) Colored Orthogonal Range Search in 2D

Space Query Time Exact O(n2) O(log n) Reporting O(n log n) O(log n + K) (1 + ε)-Approximation O(n log n) O

  • log n

ε2

  • Query rectangle
slide-50
SLIDE 50

(2) 3-sided rectangle stabbing in 2D

Space: O(n) Query Time: O(log log U + (log log n)2)

slide-51
SLIDE 51

(2) Optimal 3-sided rectangle stabbing in 2D

Space: O(n) Query Time: O(log log U + (log log n)2)

O(1)

slide-52
SLIDE 52

A General Reduction

Previous: Emptiness [Aronov & Har-Peled] Range-min [Kaplan, Ramos & Sharir] Reporting [Afshani & Chan]

Approximate range counting

slide-53
SLIDE 53

A General Reduction

Two Companion Queries: (A) C-Approximation (B) Reporting

Space: O(Scapp(n) + Srep(n)) Query time: O(Qcapp(n)+Qrep(n)+ε−2 log n)

Approximate range counting

slide-54
SLIDE 54

Approximate Counting via Random Sampling

slide-55
SLIDE 55

Reporting Structure to Count Colors

slide-56
SLIDE 56

Sample, Sample, Sample....Colors

Find a structure with output-size

[C1 log n, C2 log n]

slide-57
SLIDE 57

C-approximation is the saviour

Directly jump to the structure with output-size

[C1 log n, C2 log n]

slide-58
SLIDE 58

Final Comments

slide-59
SLIDE 59

Main techniques used

Van Emde Boas + K-d Tree BITology Filtering search “Parallel” point location in 2D Shallow Cuttings Finding exact threshold Random sampling on objects Hardness result Transformation from colored to uncolored Random sampling on colors

slide-60
SLIDE 60

Two Open Problems

Orthogonal point location in d ≥ 4.

Is it affected by the curse of dimensionality?

Is top-k equivalent to prioritized reporting?

Conjecture: Yes.

slide-61
SLIDE 61

Thank You!!