of Local and Global BitTorrent Workload Dynamics Niklas Carlsson - - PowerPoint PPT Presentation

of local and global bittorrent
SMART_READER_LITE
LIVE PREVIEW

of Local and Global BitTorrent Workload Dynamics Niklas Carlsson - - PowerPoint PPT Presentation

A Longitudinal Characterization of Local and Global BitTorrent Workload Dynamics Niklas Carlsson Linkping University Gyrgy Dan KTH Royal Institute of Technology Anirban Mahanti NICTA Martin Arlitt HP Labs and University of Calgary


slide-1
SLIDE 1

March 14, 2012

A Longitudinal Characterization

  • f Local and Global BitTorrent

Workload Dynamics

Niklas Carlsson Linköping University György Dan KTH Royal Institute of Technology Anirban Mahanti NICTA Martin Arlitt HP Labs and University of Calgary

slide-2
SLIDE 2

Motivation

 Use of Internet for content delivery is massive

… and becoming more so

 How to make scalable and efficient?  Server-based and peer-to-peer

 Chunk-based approach proven scalable

 Files split into smaller chunks  Clients can download from both servers and other

clients (peers)

 How to best manage large-scale content

replication systems

 E.g., where to place chunks?  Must first understand workload dynamics ...

slide-3
SLIDE 3

Background: BitTorrent

Single file download

 File split into many smaller chunks  Downloaded from both seeds and downloaders  Distribution paths are dynamically determined

 Based on data availability

Arrivals Departures

Downloader Downloader Downloader Downloader Seed Seed

Download time

Seed residence time

Torrent

(downloaders and seeds)

slide-4
SLIDE 4

Background: BitTorrent

Multi-tracked torrents

 Torrent file

“announce-list” URLs

 Trackers

Register torrent file

Maintain state information

 Peers

Obtain torrent file

Choose one tracker at random

Announce

Report status

Peer exchange (PEX)

Swarm Swarm

slide-5
SLIDE 5

Background: BitTorrent

Multi-tracked torrents

 Torrent file

“announce-list” URLs

 Trackers

Register torrent file

Maintain state information

 Peers

Obtain torrent file

Choose one tracker at random

Announce

Report status

Peer exchange (PEX)

Swarm Swarm

slide-6
SLIDE 6

Contributions

 Longitudinal multi-torrent analysis

 48 weeks from two vantage points

 Capturing differences in dynamics observed

locally and globally

 University campus vs. global tracker-based

 Example observations

 Campus users download larger files  Campus users early adopters (except music)  High popularity churn  Most popular content peak later

slide-7
SLIDE 7

Measurement overview

Active + passive measurements

Swarm

 Longitudinal data  Two vantage points

 University campus

(ingress/egress)

 Global trackers

Popularity dynamics

slide-8
SLIDE 8

University: tracker communication

Passive measurements

Extract HTTP peer-to-tracker traffic at campus ingress/egress

8

slide-9
SLIDE 9

University: tracker communication

Passive measurements

Extract HTTP peer-to-tracker traffic at campus ingress/egress

9

slide-10
SLIDE 10

Global: Tracker scrapes

Active measurements

Periodically request the current state as observed at a large set

  • f trackers
slide-11
SLIDE 11

Global: Tracker scrapes

Active measurements

Periodically request the current state as observed at a large set

  • f trackers
slide-12
SLIDE 12

Popularity dynamics

Measurement overview

Active + passive measurements

slide-13
SLIDE 13

10 10

2

10

4

10

6

10 10

2

10

4

10

6

Rank Popularity Zipf(1e+007,1) MZipf(1e+007,50,1) GZipf(2e+005,0.02,1e-005,1)

Head Trunk Tail E.g., Dan & Carlsson [IPTPS 2010]

Previous work

Popularity distribution

 Popularity distribution statistics

 Over lifetime  Over different time period  Different sampling methods

slide-14
SLIDE 14

Summary of datasets

Property University Global Mininova

Trackers Torrents Downloads HTTP requests 2,371 56,963 1.73 M 249 M 721 11.2 M 37.0 B

  • 1,690

911,687

  • Start date

End date Frequency

  • Sep. 15, 2008
  • Aug. 17, 2009

All requests

  • Sep. 15, 2008
  • Aug. 17, 2009

Weekly scrapes Sep., 2008 Aug., 2009 Twice

slide-15
SLIDE 15

Summary of datasets

Property University Global Mininova

Trackers Torrents Downloads HTTP requests 2,371 56,963 1.73 M 249 M 721 11.2 M 37.0 B

  • 1,690

911,687

  • Start date

End date Frequency

  • Sep. 15, 2008
  • Aug. 17, 2009

All requests

  • Sep. 15, 2008
  • Aug. 17, 2009

Weekly scrapes Sep., 2008 Aug., 2009 Twice

 48 weeks of overlapping longitudinal data

slide-16
SLIDE 16

Summary of datasets

Property University Global Mininova

Trackers Torrents Downloads HTTP requests 2,371 56,963 1.73 M 249 M 721 11.2 M 37.0 B

  • 1,690

911,687

  • Start date

End date Frequency

  • Sep. 15, 2008
  • Aug. 17, 2009

All requests

  • Sep. 15, 2008
  • Aug. 17, 2009

Weekly scrapes Sep., 2008 Aug., 2009 Twice

 Many torrents (and downloads) …

slide-17
SLIDE 17

Dataset summary

Torrents observed

Dataset summary

Torrents observed

56,963 11.2 M

slide-18
SLIDE 18

Dataset summary

Torrents observed

Dataset summary

Torrents observed

56,963 11.2 M

90%

slide-19
SLIDE 19

Dataset summary

Torrents observed

 Most of the files observed locally are also observed

in the global dataset

Dataset summary

Torrents observed

56,963 11.2 M

90%

slide-20
SLIDE 20

Dataset summary

Torrents observed

11.2 M 56,963

slide-21
SLIDE 21

Dataset summary

Torrents observed

11.2 M 56,963 911,687

slide-22
SLIDE 22

Dataset summary

Torrents observed

11.2 M 56,963 911,687

 Mininova screen scrapes also provide us

with size and category information for some of these files

slide-23
SLIDE 23

Dataset summary

Torrents observed

11.2 M 56,963 911,687

 Mininova screen scrapes also provide us

with size and category information for some of these files

slide-24
SLIDE 24

Dataset summary

Torrents observed

11.2 M 56,963 911,687

 Mininova screen scrapes also provide us

with size and category information for some of these files 33%

slide-25
SLIDE 25

Content download characteristics

File size distribution, per download

Campus users download larger files

slide-26
SLIDE 26

Content download characteristics

File size distribution, per download

Campus users download larger files

Size difference

slide-27
SLIDE 27

Content download characteristics

Breakdown per category

Campus users download

More movies and TV shows

Less music

slide-28
SLIDE 28

Content download characteristics

Breakdown per category

Campus users download

More movies and TV shows

Less music More

slide-29
SLIDE 29

Content download characteristics

Breakdown per category

Campus users download

More movies and TV shows

Less music Less

slide-30
SLIDE 30

Content download characteristics

Breakdown per category

Campus users download

More movies and TV shows

Less music

Again, biased towards larger contents ...

slide-31
SLIDE 31

Early adopters

Terminology

Time Downloads

slide-32
SLIDE 32

Early adopters

Terminology

Local peak Time Downloads

slide-33
SLIDE 33

Early adopters

Terminology

Local peak Time until peak Time Downloads

slide-34
SLIDE 34

Early adopters

Terminology

Global peak Local peak Time Downloads

slide-35
SLIDE 35

Early adopters

Terminology

Global peak Local peak Time Downloads Difference in peak times

slide-36
SLIDE 36

Early adopters

Terminology

Global peak Time Time Downloads Local downloads before global peak

slide-37
SLIDE 37

Early adopters

Downloads relative to global peak

Campus users are generally early adopters of content

70% of downloads before global peak

40% of downloads at least 10 weeks before global peak

slide-38
SLIDE 38

Early adopters

Downloads relative to global peak

Campus users are generally early adopters of content

70% of downloads before global peak

40% of downloads at least 10 weeks before global peak Early downloads

slide-39
SLIDE 39

Early adopters

Downloads relative to global peak

Campus users are generally early adopters of content

70% of downloads before global peak

40% of downloads at least 10 weeks before global peak 40% 70%

slide-40
SLIDE 40

Early adopters

Downloads relative to global peak

Campus users are generally early adopters of content

Except for music

Perhaps campus users can be used to predict some future popularity ...

And used for seeding such content

slide-41
SLIDE 41

Early adopters

Downloads relative to global peak

Campus users are generally early adopters of content

Except for music

Perhaps campus users can be used to predict some future popularity ...

And used for seeding such content

Exception

slide-42
SLIDE 42

Better predictor the more popular the content becomes

As well as for some niche content ...

Early adopters

Downloads relative to global peak

slide-43
SLIDE 43

Better predictor the more popular the content becomes

As well as for some niche content ...

Early adopters

Downloads relative to global peak

Early local peaks!!

slide-44
SLIDE 44

Time until peak

Global popularity peaks ...

The global popularity often peak late for popular content

Early flash crowds do not dominate the popularity

Perhaps a sign that rich-gets-richer a better model ...

slide-45
SLIDE 45

Time until peak

Global popularity peaks ...

The global popularity often peak later for popular content

Early flash crowds do not dominate the popularity

Perhaps a sign that rich-gets-richer a better model ... Correlation

slide-46
SLIDE 46

Time until peak

Global popularity peaks ...

The more popular the content

The later it peaks ...

slide-47
SLIDE 47

Time until peak

Global popularity peaks ...

The more popular the content

The later it peaks ... Later until peak

slide-48
SLIDE 48

Time until peak

Global popularity peaks ...

Rich-gets-richer

Close to linear from week-to-week

Cumulative total downloads show weaker (sub-linear) rich- gets-richer behavior

slide-49
SLIDE 49

Time until peak

Global popularity peaks ...

Rich-gets-richer

Close to linear from week-to-week

Cumulative total downloads show weaker (sub-linear) rich- gets-richer behavior Linear

slide-50
SLIDE 50

Hotset analysis

Popularity churn

High popularity churn

Roughly 50-60% new videos each week

Some files reoccur

Some video reoccur in hotset

slide-51
SLIDE 51

Hotset analysis

Popularity churn

High popularity churn

Roughly 50-60% new videos each week

Some files reoccur

Some video reoccur in hotset

Similarity between weeks Similarity with week 20

slide-52
SLIDE 52

Conclusions

 Large-scale longitudinal multi-torrent analysis

 University campus  Global trackers

 Campus users download more large files (TV shows

and movies) and a smaller fraction of music

 Campus users are “early adopters”

 Except for music

 High weekly churn in set of popular files  Most of the popular files peak well after their initial use

 Signs of rich-gets-richer behavior

slide-53
SLIDE 53

Thank you!

Niklas Carlsson Linköping University

György Dan KTH Royal Institute

Anirban Mahanti NICTA

Martin Arlitt HP Labs and University of Calgary