David Choffnes EECS, Northwestern U. - - PowerPoint PPT Presentation

david choffnes
SMART_READER_LITE
LIVE PREVIEW

David Choffnes EECS, Northwestern U. - - PowerPoint PPT Presentation

David Choffnes EECS, Northwestern U. http://aqualab.cs.northwestern.edu/projects/EdgeScope.html http://aqualab.cs.northwestern.edu Internet scale data BGP is useful, but has serious limitations PlanetLab offers more control, but skewed


slide-1
SLIDE 1

http://aqualab.cs.northwestern.edu

David Choffnes

EECS, Northwestern U.

http://aqualab.cs.northwestern.edu/projects/EdgeScope.html

slide-2
SLIDE 2

David Choffnes

2

Internet scale data

– BGP is useful, but has serious limitations – PlanetLab offers more control, but skewed view – End system monitors need widespread adoption

EdgeScope

– Make our collection of edge traces available – What do we have? – When can you have it?

EdgeScope: Exposing the view from the Edge

slide-3
SLIDE 3

David Choffnes

3

EdgeScope: Exposing the view from the Edge

231,000 547,000 35,000 1,096

slide-4
SLIDE 4

David Choffnes

4

Ono

– Uses CDN redirections to inform peer biasing for BitTorrent – Installed more than 800,000 times

NEWS

– Uses passively gathered BT data to detect, confirm and isolate network events – More than 40,000 users

Coverage

– ~8k ASNs – 54k routable prefixes – 200 countries

EdgeScope: Exposing the view from the Edge

slide-5
SLIDE 5

David Choffnes

5

Data types (see website for details)

– Per-download stats

  • Transfer rates, file-size estimates, state

– Per-connection stats

  • Transfer rates, cumulative data transferred, seed/leech

– Global stats

  • Overall transfer rates, session times

– … other interesting/necessary stuff

  • IP changes

All of this is sampled every 30 seconds

– Per-session data sampled every hour and at end

Traceroutes/pings

  • Uses builtin command, we are playing with v6 traceroutes/pings
  • Limited to a maximum number of measurements in parallel

EdgeScope: Exposing the view from the Edge

slide-6
SLIDE 6

David Choffnes

6

A platform for controlled experiments

– Why?

  • Security implications!

A topology measurement tool

– But we do have loads of traceroute data

An arbitrarily extensible data collection system

– Everything we collect relates to Ono/NEWS performance – If it fits, we can add it fairly easily

  • Needs to go through a beta process (usually about a week)
  • Once mainlined, near full adoption within about 4 days

EdgeScope: Exposing the view from the Edge

slide-7
SLIDE 7

David Choffnes

7

Started (proper) collection in December 2007 Daily stats (approximate)

– 3 to 4 GB of compressed data – About 10 to 20 GB raw data – 2.5-3M traceroutes – 100-150M connection samples

50,000,000 100,000,000 150,000,000 200,000,000 250,000,000

Per-connection Samples

20,000,000 40,000,000 60,000,000 80,000,000 100,000,000 120,000,000 140,000,000 160,000,000

Per-Download Samples

1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000

Traceroutes

slide-8
SLIDE 8

David Choffnes

8

NEWS SwarmScreen Network positioning (cool? cold? you decide.) Topology studies Fabian’s talk

EdgeScope: Exposing the view from the Edge

slide-9
SLIDE 9

David Choffnes

9

Preliminaries

– CAIDA-style agreement

Anonymization

– AS-level detail – Prefix-preserving – (Maybe) User ids, without location info

If you need IPs, you have to work with us

EdgeScope: Exposing the view from the Edge

slide-10
SLIDE 10

David Choffnes

10

Ono dataset

– Now

AS links (CoNEXT work)

– Now

Everything else

– On-demand anonymization (takes time) – Hardware on order – Quarantine period (6-12 months)

EdgeScope: Exposing the view from the Edge

slide-11
SLIDE 11

David Choffnes

11

Before you ask for data

– Be sure you know what you want – Make sure you have space for it – Give us time to get it to you

Working with data at this scale

– Throwing hardware at the analysis doesn’t work

  • Good data structures do work

– MapReduce isn’t always the best fit

  • Especially if you don’t have a giant cluster

– Dynamic languages are a bad idea

  • Seriously, perl is not your friend here
  • Thank me later

EdgeScope: Exposing the view from the Edge

slide-12
SLIDE 12

David Choffnes

12

Privacy is hard

– No really, this is serious

  • Messing this up will ruin it for everyone

– We invite new proposals/research in this area

Scale

– Cannot do this without good tech support, net ops cooperation, elbow grease – MTTFs in mirror(ed array) are closer than they appear

Other fun stuff

– Timestamp synchronization – UUIDs – Users can come and go at any time

  • Sessions and install/uninstall

EdgeScope: Exposing the view from the Edge

slide-13
SLIDE 13

David Choffnes

13

http://aqualab.cs.northwestern.edu/projects/EdgeScope.html

EdgeScope: Exposing the view from the Edge