Cashmere: Resilient Anonymous Routing Li Zhuang (U.C. Berkeley) - - PowerPoint PPT Presentation

cashmere resilient anonymous routing
SMART_READER_LITE
LIVE PREVIEW

Cashmere: Resilient Anonymous Routing Li Zhuang (U.C. Berkeley) - - PowerPoint PPT Presentation

Cashmere: Resilient Anonymous Routing Li Zhuang (U.C. Berkeley) Feng Zhou (U.C. Berkeley) Ben Y. Zhao (U. C. Santa Barbara) Antony Rowstron (Microsoft Research UK) Andrzej Skalski Plan What is anonymous routing Traditional


slide-1
SLIDE 1

Cashmere: Resilient Anonymous Routing

Andrzej Skalski

 Li Zhuang (U.C. Berkeley)  Feng Zhou (U.C. Berkeley)  Ben Y. Zhao (U. C. Santa Barbara)  Antony Rowstron (Microsoft Research UK)

slide-2
SLIDE 2

Plan

What is anonymous routing

Traditional Approach

Cashmere – design goals

Idea of Structured Overlay Networkds and Key-Based Routing

How Cashmere uses SON and KBR

Cashmere's Transmission protocol details

Attack model

Anonymity Measurements

Briefly on resilience and fault tolerance

Briefly on performance in terms of computation and communication

Notice on ommited information

slide-3
SLIDE 3

What is anonymous routing?

 Anonymous routing is a set of techniques in

comunication that protects users from identification by third-party observers.

 Typical usage is either military or anti-censorship

activity.

 This paper does not cover legal or ethical issues of

described techniques.

slide-4
SLIDE 4

Traditional Approach: Chaum-Mixes (and it's extensions)

 Each message from A to B is is passed through a

sequence of relays R1, R2... RL, each having a pair

  • f assymetric encryption keys.

 After discovering the route (selecting relays) A

encrypts a message with a series of public keys corresponding to chosen sequence.

 Then each relay apply private-key decrypt

transformation to received message, and passes it to next one, until fully-unencrypted message reaches B.

slide-5
SLIDE 5

Traditional Approach: Chaum-Mixes (and it's extensions), ctd.

 If any Ri fails to decrypt and forward for any reason, entire

process fails.

 Because none of Ri knows what the source of message was,

it's impossible to give A specific information on failure.

 Source needs to discover the failure by itself, and create

new path from living nodes. This implies either broadcast, or existence of some special tracking-nodes.

 If node failures are frequent, such approach suffers huge

performance losses.

slide-6
SLIDE 6

Cashmere – design goals

 Source anonymity – identity of source is hidden to

al other nodes, including destination node.

 Unlinkability – even if source and destination are

known to be participating in communication, they cannot be distinguished from other participating nodes as A and B.

 Resilience – improved tolerance to node failures in

terms of performance.

slide-7
SLIDE 7

Cashmere – architecture

 Instead of single-node relays, Cachmere uses virtual

relay groups, of multiple nodes.

 Membership of the group can change dynamically.  All members of one group share a common

public/private key pair.

 Forwarding path is a sequence of relay groups.  Destination is a member of final relay group.

slide-8
SLIDE 8

Idea of Structrued Overlay Networks and Key-Based Routing

 Structured overlay network (SOP) is a set of nodes (participants

  • f the network).

 Each SOP has it's own k-bit identifier space.  Each node of SOP has it's own random nodeID assigned from

this space by an off-line CA (central authority).

 Each node maintains a routing table that contains (usually)

O(log N) records like <nodeID of v, IP of v> where v is some participant of the overlay and N is the number of nodes in

  • verlay.

 It is enforced by the algorithm of constructing the routing table,

that path from A to B is

slide-9
SLIDE 9

Idea of Structrued Overlay Networks and Key-Based Routing, ctd.

Destination's nodeID is used as a key (or address). Each relay node selects from it's routing table a neighbour whose nodeID shares the longest prefix with key and forwards the message to it. The picture shows example routing from 5230 to 8954 (for some reason authors decided to write all numbers reversed in their paper, just read them backwards).

It's just an example of idea, not a Cashmere design.

slide-10
SLIDE 10

How Cashmere use it:

Instead of relay nodes, Cashmere relies on relay groups.

A relay group is a sub-set of nodes, that share the same m-bit prefix of their nodeID (1< m < k) further called groupID.

For each prefix of any length to any existing nodeID, a pair of public/private keys is generated.

User wishing to join network gets these data from CA (all signed by CA):

Unique k-bit nodeID, and number m - length of groupID prefix.

k pairs of private/public keys for all prefixes to nodeID (prefix keys)

All public prefix keys.

slide-11
SLIDE 11

Key-Based Routing with relay groups

 groupID instead of nodeID is used as the key in

Key-Based Routing procedure.

 As the message is routed through network, the first

node that receives the message and shares the groupID prefix with message key acts as a group representant, and processes the message on behalf

  • f the relay group. (later about processing)

 Therefore we can consider passing a message

through relay groups as a sequence of anycasts to members of another group.

slide-12
SLIDE 12

Transmission protocol:

A transmits message M to B.

 A generates a random sequence of m-bit groupIDs that

contains groupID of B (at any position!), the sequence identifies L relay groups. We will further call these relay groups P1, P2,... ,PL.

 A encrypts forwarding path in multiple layers. Then A

encrypts (independently) M's payload (details later) and anycasts the obtained package to P1.

 When a node N matching current prefix Pi receive a

package, it acts on behalf it's relay group: it decrypts

  • uter layer using group's private key revealing Pi+1's ID,

and anycasts it further. N also multicast current payload to all nodes in it's group.

slide-13
SLIDE 13

Transmission protocol:

how A encrypts the message payload

It's required that only B can successfully decrypt M, and that it receives it no matter which node will act as B's relay group representant.

Furthermore, current payload need to vary in each hop, because

  • therwise it could mark the routing path.

Therefore Cashmere create a relay group Pi's payload as follows: Ri is a symmetric key generated by source for each relay group Pi.

Obviously, for B in relay group Pdst, Payloaddst is generated first, and induces other indices.

slide-14
SLIDE 14

Transmission protocol:

how A encrypts forwarding path

 Forwarding path carries symmetrical keys Ri

generated by the source, successing groupID and path suffix or termination symbol.

 For optimisations, definition of Pathi will be

extended, details later.

slide-15
SLIDE 15

Transmission protocol:

example: B is in P2, L = 4, 12302 acts as 123** representant

slide-16
SLIDE 16

Transmission protocol:

Optimisations

Because Pathi and Payloadi are decoupled, Pathi can be stored for further usage by the nodes. Each node caches map Pathi <Path ↔

i+1, Pi+1, Ri>

In order to avoid asymmetric encrypt/decrypt operations on each data, source A while computing Path, creates a series of additionala values K1, K2, …, KL, where: for B in relay group Pd, and all other Ki are random. ”|” is contactenation, and FLAG means ”yes, you're a destination node, and prefix is a symmetric key”. So now, recipe for Pathi is: and the representant of relay group multicasts Ki along with Payloadi.

slide-17
SLIDE 17

Transmission protocol:

the big picture

slide-18
SLIDE 18

Quickly about Reply Address

(no new ideas here)

If sender A wishes to receive a feedback from B (either ACK or a real message with it's payload) it basically constructs a reply path in the same manner as forwarding path: A generates a sequence of P1' , P2' , ... , PL' and ensures, that it's relay group is on the list. Then: where ki', Ri' are selected random, and send it to B along with

  • riginal message.
slide-19
SLIDE 19

Attack model

 Let f be the fraction of all nodes the attacker

  • controls. Controlled nodes leak all information

(including private/public keys) immediately to attacker.

 Furthermore, attacker can listen to either all or

some part of the traffic in the network. Each time any estimations will be given, this figure will be specified.

slide-20
SLIDE 20

Anonymity Measurment:

entrophy

Let:

f – number of compromised nodes

N – number of all nodes

q – average relay group size (q = N/2^m)

L – length of the path

– set of nodes Ω

pu – probability of node u from being a source or destination of a Ω message. Then, entrophy is defined as follows:

slide-21
SLIDE 21

Anonymity Measurement:

anonymity of the system

 Intuitively, the ideal anonymity is situation, when

all nodes seems to be equally probable as source

  • r destination of a message:

Then, optimal entrophy is: Therefore, the measurement of anontymity of a system, is:

slide-22
SLIDE 22

Anonymity Measurement:

unlinkability in terms of entrophy, details

In this measurement, authors assumed that:

 attacker knows the exact number L  attacker knows only compromised nodes' traffic

Results are compared to traditional Chaum-Mixes.

slide-23
SLIDE 23

Anonymity Measurement:

unlinkability in terms of entrophy, image

slide-24
SLIDE 24

Anonymity Measurement:

unlinkability in terms of entrophy, conclusions

From that (and some ommited in this presentation calculations) authors conclude, that Cashmere has similar unlinkability level to Chaum-Mixes, in spite of a necessity to multicast data among members of relay groups. Authors also claim that the anonymity level is independent from network size, and increasing number of nodes from 20K to 2M resulted in less than 3% variation in (enthropy based) unlinkability. Furthermore, they claim that reducing size of network to 64 while f, L (path length) and q (average group size) remain the same provides similar level of anonymity. This sounds reasonable considering the fact, that L and q determine the number nodes taking part in communication, and f is the fraction of compromised nodes.

slide-25
SLIDE 25

Anonymity Measurement:

source anonymity, anonymous messages

In this measurement authors assume that:

 Destination is controlled by attacker.  Cashmere uses one-way communication (no

ReplyAddrInfo).

 Attacker does not analyse traffic beteween non-

colluded nodes.

slide-26
SLIDE 26

Anonymity Measurement:

source anonymity, anonymous messages, images

slide-27
SLIDE 27

Anonymity Measurement:

source anonymity, anonymous messages, images ctd. And with two-way communication (ReplyAddrInfo present)

slide-28
SLIDE 28

Anonymity Measurement:

source anonymity, anonymous messages

Authors conclude, that two-way communication decreases the level of anonymity. Nevertheless, this effect can balanced by increasing length of paths, trading performance for anonymity. Authors also claim that they analysed impact of network size on source anonymity, and found it has no significant impact. Unfortunately they provide no specific test results or calculations to support this claim.

slide-29
SLIDE 29

Anonymity Measurement:

traffic analysis

Assumptions are as previously, except now attacker can intercept a fraction of traffic (figure T. A.) Please note, that situation of T.A. over 90% is highly unrealistic, and yet still provide some level of anonymity.

slide-30
SLIDE 30

Anonymity Measurement:

traffic analysis, ctd

Authors argue, that situation of intercepting high percentage of traffic in a distribued system is unrealistic, and therefore Cashmere's anonymity are quite satisfactionary. Furthermore they propose a modification of protocol in a following way: if each node exchange symmetric keys with all it's neigbours (members of it's routing table) and encrypt all messages, the analysis of traffic would become

  • inefficient. There are however no test results or

calculations supporting this claim in the paper.

slide-31
SLIDE 31

Briefly on resilience and fault tolerance

Compared to classical approach, Cashmere takes advantage of sharing responsibility between nodes in relay group. A path remain active as long as each of relay groups on it contains at least one live node. Methodology of test is as follows: the number of nodes is constant, number

  • f nodes joining and failing is equal. Session time of node is

exponentailly distribued, meaning that both node failures and arrivals are Poisson processes. They assumed a mean time between failures is 200 minutes, and mean time to repair is 5 minutes. The measurements has been made using Kazaa p2p exchange program, simulating 100K exchanges of files sizing between 10 and 100 MB.

slide-32
SLIDE 32

Briefly on resilience and fault tolerance

slide-33
SLIDE 33

Brielfy on performance in terms of communication overhead

Comunication costs to maintain knowledge of candidate relay nodes: Authors compare Cashmere with node-based relays, and state that because

  • f size of Cashmere's routing table and it's dependence on the underlying

SON (structural overlay network) Cashmere node uses O(N log N) traffic to keep it's local routing information up-to-date. They oppose it to O(N^2) cost of full information, no matter how unlikely it is for any anonymizing network to even attempt to maintain such. Furthermore they ignore costs of existence of SON, so entire comparsion is at least controversial.

Cost of delivering a message is O(qL) compared to O(L) in node-based relays, because of multicast within relay groups.

slide-34
SLIDE 34

Brielfy on performance in terms of computation time overhead

Authors compare their implementation with program called Pastry, on top

  • f which Cashmere was implemented for testing.

Authors argue, that because of increased path durations, the average

  • verhead per-session in Cashmere is significantly lower than in node-

based approach. For q = 4, they found calculations necessary to maintain live path to take 5.37% of time taken in classical, node-based approach. Authors claim, that the average aggregated cost of encryption within a relay group is 46.83% of the cost at intermediate nodes in node-based

  • solutions. The saving are owed to optimizations mentioned earlier,

namely caching partial information about paths in nodes, and using symmetric keys instead of non-symmetric in payload encryption.

slide-35
SLIDE 35

Notice on omitted information

This presentation, due to it's time limitation, omitts number of information on test results and methodology, hardware used in testing, as well as authors plans for further experiments and

  • development. These information are available in
  • riginal paper.
slide-36
SLIDE 36

Questions?