Peer-to-Peer Result Dissemination in High-Volume Data Filtering - - PowerPoint PPT Presentation

peer to peer result dissemination in high volume data
SMART_READER_LITE
LIVE PREVIEW

Peer-to-Peer Result Dissemination in High-Volume Data Filtering - - PowerPoint PPT Presentation

Peer-to-Peer Result Dissemination in High-Volume Data Filtering Shariq Rizvi and Paul Burstein CS 294-4: Peer-to-Peer Systems P2P: A Delivery Infrastructure Overcast Application-level multicasting Build data distribution trees


slide-1
SLIDE 1

Peer-to-Peer Result Dissemination in High-Volume Data Filtering

Shariq Rizvi and Paul Burstein

CS 294-4: Peer-to-Peer Systems

slide-2
SLIDE 2

P2P: A Delivery Infrastructure

Overcast

Application-level multicasting Build data distribution trees Adapt to changing network conditions Inner nodes heavily loaded

SplitStream

Load-balancing across all peers Split content into redundant streams Redundancy offers resilience to failures

slide-3
SLIDE 3

Our Focus

Dynamic Application-level Multicast

Single source Multiple receivers High-volume data flow (“document streams”) Dynamic: very large number of “groups” IP multicast is bad

Rigid to deploy Dynamic groups?

“Intelligent” trees on the fly?

slide-4
SLIDE 4

Organization

Motivation

Data filtering YFilter@Berkeley Distributed YFilter

Dynamic multicast

Unstructured overlay network Metrics Experiments

Summary & future work

slide-5
SLIDE 5

Data Filtering

Pub-sub systems XML: the “wire format” for data

Web services RDF Site Summary (RSS) data feeds

  • News
  • Stock ticks

Personalized content delivery

Message brokers

Filtering Transformation Delivery

slide-6
SLIDE 6

YFilter: A Data Filtering Engine

Picture blatantly stolen from “Path Sharing and Predicate Evaluation for High-Performance XML Filtering”, Diao et al., TODS 2003

slide-7
SLIDE 7

YFilter: Some Numbers

Incoming document flow – 10-20 per second Document sizes – 20KB Subscribers – Lots! Processing bottleneck

50ms per document with 100,000 simple XML path queries

Dissemination bottleneck

Thousands of recepients per document – bandwidth

needed ~ GbPS

Solution: Distributed filtering

slide-8
SLIDE 8

Content-Based Routing

Embed filtering logic into the network

“XML routers”

Overlay topologies (e.g. mesh)

Parent routers hold disjunction of child routers’

queries

Streams filtered on the fly Problems

Low network economy – scalability? Query aggregation challenges

slide-9
SLIDE 9

Distributed Hierarchical Filtering

Filter Core Clients Clients Recurring theme: dynamic multicast

slide-10
SLIDE 10

Peer-to-Peer Result Dissemination

Source Clients

slide-11
SLIDE 11

Application-Level Dynamic Multicast

Each document has a different receiver list Exploit “peers” for dissemination Build trees on the fly

Pass documents wrapped with receiver identities Each peer contributes a fanout

Possibly high delivery delays

Heuristic: Try to minimize tree height

Application-level approach: high traffic

Heuristic: Exploit geographical distribution of clients at

source

slide-12
SLIDE 12

Possible Evaluation Metrics

Delivery delay Network economy Document loss Out-of-order delivery

slide-13
SLIDE 13

Experimental Setup

PlanetLab testbed

Over 200 nodes 1-10 clients per node

Document Size: 20KB Generation Rate:

1document/second

Query Selectivity: 10% Filter Fanout: 2 Filter Host:

planetlab1.lcs.mit.edu

Client Fanout:

1 - 20% - Modem 2 - 40% - DSL 4 - 40% - Cable

slide-14
SLIDE 14

Result 1: Distribution of Delays

Delivery Delay Distribution - 200 Clients

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1000 2000 3000 4000 5000 6000 7000 8000

Delivery Delay (ms)

% Clients

slide-15
SLIDE 15

Result 2: Scalability

Delivery Delay Distribution

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 1 1 1 1 2 1 3 1 4 Delivery Delay (ms) % Clients

200 Clients 400 Clients 1000 Clients 2000 Clients

slide-16
SLIDE 16

Result 3: Bandwidth Requirements

Outgoing Bandwidth

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1 2 3 4 5

Outgoing Bandwidth (KBps) % Clients

200 Clients 400 Clients 1000 Clients 2000 Clients

slide-17
SLIDE 17

Exploiting Geographical Distribution of Clients

slide-18
SLIDE 18

Result 4: With the optimization

Regional Optimization

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 Delivery Delay (ms) % Clients

2000 Clients 2000 Clients OP

slide-19
SLIDE 19

Summary

Current filtering engines – processing and

bandwidth bottlenecks

A possible scheme for distributed filtering

Recurring theme: highly dynamic multicast

Application-level multicast

Peer-to-peer delivery Trees construction on the fly

PlanetLab is crazy

slide-20
SLIDE 20

Future Work

Reliable, dedicated delivery nodes Exploiting query similarity for discovering

multicast groups