Inferring Internet Server IPv4 and IPv6 Address Relationships Robert - - PowerPoint PPT Presentation

inferring internet server ipv4 and ipv6 address
SMART_READER_LITE
LIVE PREVIEW

Inferring Internet Server IPv4 and IPv6 Address Relationships Robert - - PowerPoint PPT Presentation

Inferring Internet Server IPv4 and IPv6 Address Relationships Robert Beverly, Arthur Berger , Nicholas Weaver , Larry Campbell Naval Postgraduate School Akamai ICSI/UCSD rbeverly@nps.edu, awberger@mit.edu February 7, 2013


slide-1
SLIDE 1

Inferring Internet Server IPv4 and IPv6 Address Relationships

Robert Beverly, Arthur Berger∗, Nicholas Weaver†, Larry Campbell∗

Naval Postgraduate School

∗Akamai †ICSI/UCSD

rbeverly@nps.edu, awberger@mit.edu

February 7, 2013 CAIDA Active Internet Measurement 2013

Beverly, et al. (NPS) CAIDA AIMS-5 1 / 18

slide-2
SLIDE 2

Sibling Resolution Intro

Sibling Resolution

New Problem We Term “Sibling Resolution:” Given a candidate (IPv4, IPv6) address pair, determine if these addresses are assigned to the same cluster, device, or interface. Lots of prior work on passive sibling associations: e.g. web-bugs, javascript, flash, etc. Prior work focuses on clients (adoption, performance) This work:

Targeted, active test: on-demand for any given pair Infrastructure: finding server siblings

Eventual goal: router siblings (not there yet)

Beverly, et al. (NPS) CAIDA AIMS-5 2 / 18

slide-3
SLIDE 3

Sibling Resolution Intro

Motivation

Why? Adoption (non-adoption):

IPv4 and IPv6 expected to co-exist (for a long while?) → dual-stacked devices Track IPv6 evolution

Security:

IPv6 is largely unsecured! Inter-dependence of IPv6 on IPv4 (and vice-versa) e.g. attack on IPv6 resource affecting IPv4 service Correlating geolocation, reputation, etc with IPv4 host counterpart.

Performance:

Getting measurements of IPv4 vs. IPv6 performance correct: isolate path vs. host performance

Operationally deployed today in Akamai, informing Edgescape geolocation.

Beverly, et al. (NPS) CAIDA AIMS-5 3 / 18

slide-4
SLIDE 4

Methodology

Techniques

3 Techniques:

1

(Passive) Induce DNS resolvers to use both v4 and v6 during natural resolution of Akamai resources (deployed, large set of measurements).

2

(Active) Force DNS to use a chain of v4 and v6 addresses to perform resolution. Allows us to validate (a subset) of the passively collected results.

3

(Active) Probe potentially in-common TCP stack of a candidate v4, v6 sibling pair to obtain timestamp fingerprint.

Beverly, et al. (NPS) CAIDA AIMS-5 4 / 18

slide-5
SLIDE 5

Methodology

Passive DNS

Encode IPv4 address of querying resolver into a AAAA record returned for the next-level NS Subsequent query to the IPv6 authority nameserver permits linking v4 and v6 resolver addresses

src: IPv6, dst: 2001:428::IPv4

Pairs DNS Resolver IPv4 IPv6

NS=2001:428::IPv4 A? www.a.example.com (IPv4,IPv6) A? www.a.example.com

Second−Level First−Level Auth DNS Auth DNS

Beverly, et al. (NPS) CAIDA AIMS-5 5 / 18

slide-6
SLIDE 6

Methodology

Active DNS

Custom DNS server as authority for special domain Chain of alternating v6, v4 CNAME records, only available via v6 or v4, that maintain state within the dynamic name.

v6Q? c1.N.v6.domain CNAME=c2.N.A1.v4.domain CNAME=c3.N.A1.A2.v6.domain CNAME=txt.N.A1.A2.A3.v4.domain v6Q? c3.N.A1.A2.v6.domain v4Q? txt.N.A1.A2.A3.v4.domain TXT="A1 A2 A3 A4" v4Q? c2.N.A1.v4.domain c 1 . N . v 6 . d

  • m

a i n TXT="A1 A2 A3 A4"

Resolver (w/ IPv6=A1,A3; IPv4=A2,A4) Prober domain Auth DNS

Beverly, et al. (NPS) CAIDA AIMS-5 6 / 18

slide-7
SLIDE 7

Methodology

DNS Results

Deployed on Akamai; gathered ≃ 675,000 v4,v6 pairs Importance: directing users to content in a CDN relies on properties of DNS resolution. Improves IPv6 geolocation. 77% of v4,v6 pairs are 1-1, the rest is messy. Most complexity due to large cluster resolvers (e.g. nominum, google DNS, openDNS, comcast, etc).

2-1 20 21 22 23 24 25 26 27 28 29 210 number of v4 addresses in equiv. class 2-1 20 21 22 23 24 25 26 27 28 29 210 211 number of v6 addresses in equiv. class 0.1% 0.2% 0.5% 1% 2% 5% 75%

Beverly, et al. (NPS) CAIDA AIMS-5 7 / 18

slide-8
SLIDE 8

Methodology

Targeted, Active Technique

Intuition: IPv4 and IPv6 share a common transport-layer (TCP) stack Leverage prior work on physical device fingerprinting using TCP timestamp clockskew [Kohno 2005] TCP timestamp option: “TCP Extensions for High Performance” [RFC1323, May 1992]. Universally supported, enabled by default. Note: TS clock = system clock Note: TS clock frequently unaffected by system clock adjustments (e.g. NTP) Basic Idea: Probe over time. Fingerprint is clock skew (and remote clock resolution).

Beverly, et al. (NPS) CAIDA AIMS-5 8 / 18

slide-9
SLIDE 9

Methodology Examples

Example

Example Gather 4 timestamp series:

www.caida.org (v4 and v6) www.ripe.net (v4 and v6)

Beverly, et al. (NPS) CAIDA AIMS-5 9 / 18

slide-10
SLIDE 10

Methodology Examples

Example

  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 200 400 600 800 1000

  • bserved offset (msec)

measurement time(sec) Host A (IPv6) Host B (IPv4) α=0.029938 β=-3.519 α=-0.058276 β=-1.139

CAIDA IPv6 vs. RIPE IPv4 Observe different skew slopes (one negative) Different timestamp granularity y = 0.029938x equates to skew of ≈ 1.8ms / minute, or ≈ 15 minutes per year. False siblings!

Beverly, et al. (NPS) CAIDA AIMS-5 10 / 18

slide-11
SLIDE 11

Methodology Examples

Example

  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 20 30 40 200 400 600 800 1000

  • bserved offset (msec)

measurement time(sec) Host A (IPv6) Host B (IPv4) α=0.029938 β=-3.519 α=-0.058276 β=-1.139

False Siblings

  • 70
  • 60
  • 50
  • 40
  • 30
  • 20
  • 10

10 200 400 600 800 1000

  • bserved offset (msec)

measurement time(sec) Host A (IPv6) Host A (IPv4) α=-0.058253 β=-1.178 α=-0.058276 β=-1.139

True Siblings CAIDA IPv4 vs. CAIDA IPv6: identical slopes (θ = 0.0098) CAIDA IPv6 vs. RIPE IPv4: different slopes (θ = 31.947)

Beverly, et al. (NPS) CAIDA AIMS-5 11 / 18

slide-12
SLIDE 12

Methodology Examples

Complications

  • 50

50 100 150 200 250 10000 20000 30000 40000 50000 60000 70000

  • bserved offset (msec)

measurement time(sec) 193.110.128.199 2001:67c:2294:1000::f199

www.marca.com (#6 on alexa ipv6) Not always so distinct of a difference! Slope angle difference: θ = 2.046

Beverly, et al. (NPS) CAIDA AIMS-5 12 / 18

slide-13
SLIDE 13

Methodology Examples

Complications

5e+08 1e+09 1.5e+09 2e+09 2.5e+09 3e+09 3.5e+09 4e+09 4.5e+09 50 100 150 200 TCP Timestamp TCP Packet Sample apache.org V4 apache.org V6

www.apache.com Raw TCP timestamps Deterministically random and monotonic for a single connection Random across

  • connections. Looks like

noise to us.

Beverly, et al. (NPS) CAIDA AIMS-5 13 / 18

slide-14
SLIDE 14

Methodology Examples

Complications

  • 0.005

0.005 0.01 0.015 0.02 0.025 10000 20000 30000 40000 50000 60000 70000

  • bserved offset (msec)

measurement time(sec) 203.5.76.12 2001:388:1:5062::cb05:4c0c

What’s going on here?

Beverly, et al. (NPS) CAIDA AIMS-5 14 / 18

slide-15
SLIDE 15

Methodology Examples

Complications

  • 2e+16
  • 1.8e+16
  • 1.6e+16
  • 1.4e+16
  • 1.2e+16
  • 1e+16
  • 8e+15
  • 6e+15
  • 4e+15
  • 2e+15

2e+15 10000 20000 30000 40000 50000 60000 70000

  • bserved offset (msec)

measurement time(sec) 209.85.225.160 2001:4860:b007::a0

Also detects load balancing among servers But how to deal with it?

Beverly, et al. (NPS) CAIDA AIMS-5 15 / 18

slide-16
SLIDE 16

Results

Machine Sibling Inference

Machine Sibling Inference Methodology: Analyze Alexa top 100,000 websites Pull A and AAAA records 1398 (≈ 1.4%) have IPv6 DNS Repeatedly fetch root HTML page via IPv4 and IPv6 via deterministic IP address Record all packets

Beverly, et al. (NPS) CAIDA AIMS-5 16 / 18

slide-17
SLIDE 17

Results

Machine Sibling Inference

Alexa 100K Targeted Machine-Sibling Inference Case Count v4 and v6 non-monotonic (possible siblings) 109 (7.8%) v4 or v6 non-monotonic (non-siblings) 140 (10.0%) v4 and v6 no timestamps (possible siblings) 94 (6.7%) v4 or v6 no timestamps (non-sibling) 101 (7.2%) Our technique fails when timestamps are not monotonic across TCP flows (e.g. load-balancer or BSD OS) Or, when timestamps are not supported (e.g. middlebox) Note, can disambiguate non-siblings

Beverly, et al. (NPS) CAIDA AIMS-5 17 / 18

slide-18
SLIDE 18

Results

Machine Sibling Inference

Alexa 100K Targeted Machine-Sibling Inference Case Count v4 and v6 non-monotonic (possible siblings) 109 (7.8%) v4 or v6 non-monotonic (non-siblings) 140 (10.0%) v4 and v6 no timestamps (possible siblings) 94 (6.7%) v4 or v6 no timestamps (non-sibling) 101 (7.2%) Skew-based siblings 839 (60.0%) Skew-based non-siblings 115 (8.3%) Total 1398 (100%) 25.5% (356) non-siblings 57% of skew-based non-siblings are in same AS 12.6% of skew-based siblings are in different ASes

Beverly, et al. (NPS) CAIDA AIMS-5 18 / 18

slide-19
SLIDE 19

Results

Feedback

Thanks! Viz: Awesome scatter plot! Data-Sharing: None so far (Akamai data off-limits, web-probing can be released) Feedback:

Do you believe our motivation story!?!? Operational experience with large DNS resolvers? Thoughts on router v4,v6 sibling resolution?

Beverly, et al. (NPS) CAIDA AIMS-5 19 / 18