Internet traffic measurements Renata Teixeira (Inria) Why measure - - PowerPoint PPT Presentation

internet traffic measurements
SMART_READER_LITE
LIVE PREVIEW

Internet traffic measurements Renata Teixeira (Inria) Why measure - - PowerPoint PPT Presentation

Internet traffic measurements Renata Teixeira (Inria) Why measure traffic? Performance analysis Anomaly and intrusion detec=on Network engineering Traffic at different granulari=es IP-level packets Capture per-packet


slide-1
SLIDE 1

Internet traffic measurements

Renata Teixeira (Inria)

slide-2
SLIDE 2

Why measure traffic?

  • Performance analysis
  • Anomaly and intrusion detec=on
  • Network engineering
slide-3
SLIDE 3

Traffic at different granulari=es

  • IP-level packets

– Capture per-packet informa=on

  • Flows

– Sta=s=cs of packets grouped into flows

  • Network interface

– Sta=s=cs of packets that traverse a network interface

slide-4
SLIDE 4

Outline

  • Mo=va=on and defini=ons
  • Tools for measuring traffic

– Packet capture – Interface counts – Flow capture

  • Traffic matrix
  • Trace anonymiza=on
  • Summary
slide-5
SLIDE 5

Packet capture on end systems

  • Basic method

– Capture and record packets passing through an interface

Packet Trace t1

slide-6
SLIDE 6

Tools

  • tcpdump

– Command-line packet capture

  • libpcap

– C/C++ library for packet capture

  • Wireshark

– Packet capture and analysis

slide-7
SLIDE 7

Possible measurement ar=facts

  • Dropped packets are common under high u=liza=on

– Inspect report of dropped packets

  • Other less frequent ar=facts

– Fail to report drops – Falsely report drops – Duplicate packets – Re-ordered packets – Misfilter

slide-8
SLIDE 8

How to capture packets on point-to- point links?

?

slide-9
SLIDE 9

Port mirroring

  • Basic method

– Copies packets from one or more ports to a mirroring port – Run packet capturing tool on host connected to mirroring port

t1

mirroring port

slide-10
SLIDE 10

Network Tap

  • Basic method

– Electrical or op=cal spliWer on monitored link – Monitoring host with specialized network interface and interface driver

t1

slide-11
SLIDE 11

Comparison

Port mirroring

  • Pros

– Easy to setup – Low cost

  • Cons

– Hardware and media errors are dropped – Packets may be dropped at high u=liza=on

Tap

  • Pros

– Monitor all packets – Eliminates risk of dropped packets

  • Cons

– Expensive

slide-12
SLIDE 12

High-speed capture with commodity hardware

  • Key idea

– Direct access to NIC (i.e., bypass kernel) – Parallelism

  • Tools

– TStat – ntop – WAND

slide-13
SLIDE 13

Outline

  • Mo=va=on and defini=ons
  • Tools for measuring traffic

– Packet capture – Interface counts – Flow capture

  • Traffic matrix
  • Trace anonymiza=on
  • Summary
slide-14
SLIDE 14

Interface counts

  • Basic method

– Routers log simple sta=s=cs (bytes/packets)

  • Total values since interface ini=alized

– Request sta=s=cs using SNMP (MIB-II MIB)

#packets In #packets Out 1 2 #packets In #packets Out 2

slide-15
SLIDE 15

Example proper=es

  • Number of In/Out bytes (total, unicast, non-unicast)
  • Number of In/Out packets (total, unicast, non-unicast)
  • Number of In/Out discarded/corrupted packets
slide-16
SLIDE 16

Interface counts: Pros and Cons

  • Pros

– Supported on all networking equipment – LiWle performance impact on routers – LiWle storage needs

  • Cons

– Missing data (SNMP uses UDP) – Polling makes it hard to synchronize data from mul=ple interfaces – Coarse-grained measurements

slide-17
SLIDE 17

Outline

  • Mo=va=on and defini=ons
  • Tools for measuring traffic

– Packet capture – Interface counts – Flow capture

  • Traffic matrix
  • Trace anonymiza=on
  • Summary
slide-18
SLIDE 18

IP Flows

  • Set of packets with common proper=es

– Defini=on can vary

  • Tradi=onal 5-tuple: src IP, dst IP, src port, dst port, protocol
  • Packets from one ingress to an egress point
  • Packets that are “close” together in =me

– Maximum spacing between packets (e.g., 15 sec, 30 sec)

flow 1 flow 2 flow 3

slide-19
SLIDE 19

Flow ≠ applica=on session

  • Applica=on session may be composed of mul=ple flows
  • Packets in applica=on session may not follow same links
  • Hard to measure applica=on session inside the network
slide-20
SLIDE 20

Capturing flow sta=s=cs in routers

  • Basic method

– Specify set of proper=es that define a flow – Router log sta=s=cs per flow (flow records) – Push flow records to collec=ng process (IPFIX)

flow id #packets 1 1 2

slide-21
SLIDE 21

Flow records: Flow iden=fier

  • Packet header informa=on

– Source and des=na=on IP addresses – Source and des=na=on TCP/UDP port numbers – Other IP & TCP/UDP header fields (e.g., protocol, ToS bits)

  • Rou=ng informa=on

– Input and output interfaces – Source and des=na=on IP prefix (mask length) – Source and des=na=on autonomous system numbers

slide-22
SLIDE 22

Flow records: Flow proper=es

  • Aggregate traffic informa=on

– Start and finish =me of the flow (=me of first & last packet) – Total number of bytes and number of packets in the flow – TCP flags (e.g., logical OR over the sequence of packets)

slide-23
SLIDE 23

Packet Sampling

  • Packet sampling before flow crea=on

– 1-out-of-m sampling of individual packets (e.g., m=100) – Crea=on of flow records over the sampled packets

  • Reducing overhead

– Avoid per-packet overhead on (m-1)/m packets – Avoid crea=ng records for a large number of small flows

  • Increasing overhead (in some cases)

– May split some long transfers into mul=ple flow records – … due to larger =me gaps between successive packets

slide-24
SLIDE 24

Tools

  • In-router capture

– Cisco NetFlow – Juniper JFlow

  • Collec=on and post-processing

– Flow-tools – ntop

slide-25
SLIDE 25

Flow monitoring: Pros and Cons

Pros

  • More details about traffic

compared to counters

  • Lower measurement volume

than full packet traces

  • Available on high-end line

cards (Neilow, Jflow)

  • Control over overhead via

aggrega=on and sampling

Cons

  • Less details than packet

capture

– No individual packet arrival =mes – No informa=on on packet content

  • Not uniformly supported

(gejng beWer with IPFIX)

  • Computa=on/memory

requirements for the flow cache

slide-26
SLIDE 26

Using the traffic data in network

  • pera=ons
  • Interface counts: everywhere

– Tracking link u=liza=ons and detec=ng anomalies – Genera=ng bills for traffic on customer links – Inference of the offered load (i.e., traffic matrix)

  • Packet monitoring: selected loca=ons

– Analyzing the small =me-scale behavior of traffic – Troubleshoo=ng specific problems on demand

  • Flow monitoring: selec=ve, e.g,. network edge

– Tracking the applica=on mix – Direct computa=on of the traffic matrix – Input to denial-of-service aWack detec=on

slide-27
SLIDE 27

Outline

  • Mo=va=on and defini=ons
  • Tools for measuring traffic

– Packet capture – Interface counts – Flow capture

  • Traffic matrix
  • Trace anonymiza=on
  • Summary
slide-28
SLIDE 28

Traffic matrix: Defini=on

– Representa=on of traffic volume flowing from sources to des=na=ons

  • Bytes
  • Packets
  • Flows, etc.
  • Links
  • Routers
  • Points of Presence (PoPs)
  • Networks
slide-29
SLIDE 29

Usage

  • Capacity planning
  • Traffic engineering (IGP and BGP)
  • Billing
  • Peering analysis
  • Anomaly detec=on
  • Design of new protocols
slide-30
SLIDE 30

AS1

Ingress router to egress router matrix

AR1 AS2 AR2 CR1 AR3 CR2 PoP1 CR3 CR4 AR1 AR2 AR3 CR5 CR6 PoP3 CR7 CR8 PoP4 AS3 d CR1 … CR8 CR1 … CR8

slide-31
SLIDE 31

Measuring the traffic matrix

  • Packet capture

– Gives the most detailed view of traffic – But, expensive and high collec=on overhead

  • Flow capture

– Enough to build traffic matrix – Lower collec=on overhead (in par=cular with sampling)

  • Interface counts

– Cannot directly measure traffic matrix, must es=mate – Lowest overhead, widely available

slide-32
SLIDE 32

Outline

  • Mo=va=on and defini=ons
  • Tools for measuring traffic

– Packet capture – Interface counts – Flow capture

  • Traffic matrix
  • Trace anonymiza=on
  • Summary
slide-33
SLIDE 33

Benefits of sharing data

  • Good scien=fic prac=ce
  • Get others to work on relevant problems
  • Learn from analysis of others
  • Get broader view
slide-34
SLIDE 34

But, packet traces contain lots of sensi=ve informa=on

  • Headers

– Connec=on endpoints: who is talking to who; sites visited – Protocol, ports: applica=ons used

  • Payload

– Visited content – Passwords, etc.

slide-35
SLIDE 35

Solu=on: Anonymiza=on

  • Process to sani=ze data to ensure anonymity

– Absence of iden=ty – Prevent others from linking iden=ty to ac=ons of an individual

  • Packet trace anonymiza=on tools

– tcpdpriv, ipsumdump, ip2anonip, Crypto-PAn, PktAnon

slide-36
SLIDE 36

Anonymizing payload

  • Payload contains most sensi=ve informa=on

– BeWer if removed completely – If not possible, get minimum necessary

  • E.g., HTTP host beWer than full URL
slide-37
SLIDE 37

Anonymizing packet headers

  • Packet headers can be shared with care

– MAC addresses

  • Poten=al to link records with the same MAC across

datasets

– IP addresses oren need to be anonymized – IP addresses appear in other parts of the packet

  • IP op=ons (e.g., record route)
  • ICMP/DNS packets
slide-38
SLIDE 38

Outline

  • Mo=va=on and defini=ons
  • Tools for measuring traffic

– Packet capture – Interface counts – Flow capture

  • Traffic matrix
  • Trace anonymiza=on
  • Summary
slide-39
SLIDE 39

Summary

  • Packet capture

– Detailed per-packet measurements; high overhead

  • Interface counts

– Coarse measurements per link; low overhead

  • Flow capture

– More details than link counts, less than packet captures – Medium collec=on overhead controlled with sampling

  • Traffic matrix

– Measured from flow capture

  • Trace aonymiza=on is key for data sharing
slide-40
SLIDE 40

Ques=ons?