SLIDE 1 Cashmere: Resilient Anonymous Routing
Andrzej Skalski
Li Zhuang (U.C. Berkeley) Feng Zhou (U.C. Berkeley) Ben Y. Zhao (U. C. Santa Barbara) Antony Rowstron (Microsoft Research UK)
SLIDE 2 Plan
What is anonymous routing
Traditional Approach
Cashmere – design goals
Idea of Structured Overlay Networkds and Key-Based Routing
How Cashmere uses SON and KBR
Cashmere's Transmission protocol details
Attack model
Anonymity Measurements
Briefly on resilience and fault tolerance
Briefly on performance in terms of computation and communication
Notice on ommited information
SLIDE 3 What is anonymous routing?
Anonymous routing is a set of techniques in
comunication that protects users from identification by third-party observers.
Typical usage is either military or anti-censorship
activity.
This paper does not cover legal or ethical issues of
described techniques.
SLIDE 4 Traditional Approach: Chaum-Mixes (and it's extensions)
Each message from A to B is is passed through a
sequence of relays R1, R2... RL, each having a pair
- f assymetric encryption keys.
After discovering the route (selecting relays) A
encrypts a message with a series of public keys corresponding to chosen sequence.
Then each relay apply private-key decrypt
transformation to received message, and passes it to next one, until fully-unencrypted message reaches B.
SLIDE 5 Traditional Approach: Chaum-Mixes (and it's extensions), ctd.
If any Ri fails to decrypt and forward for any reason, entire
process fails.
Because none of Ri knows what the source of message was,
it's impossible to give A specific information on failure.
Source needs to discover the failure by itself, and create
new path from living nodes. This implies either broadcast, or existence of some special tracking-nodes.
If node failures are frequent, such approach suffers huge
performance losses.
SLIDE 6 Cashmere – design goals
Source anonymity – identity of source is hidden to
al other nodes, including destination node.
Unlinkability – even if source and destination are
known to be participating in communication, they cannot be distinguished from other participating nodes as A and B.
Resilience – improved tolerance to node failures in
terms of performance.
SLIDE 7 Cashmere – architecture
Instead of single-node relays, Cachmere uses virtual
relay groups, of multiple nodes.
Membership of the group can change dynamically. All members of one group share a common
public/private key pair.
Forwarding path is a sequence of relay groups. Destination is a member of final relay group.
SLIDE 8 Idea of Structrued Overlay Networks and Key-Based Routing
Structured overlay network (SOP) is a set of nodes (participants
Each SOP has it's own k-bit identifier space. Each node of SOP has it's own random nodeID assigned from
this space by an off-line CA (central authority).
Each node maintains a routing table that contains (usually)
O(log N) records like <nodeID of v, IP of v> where v is some participant of the overlay and N is the number of nodes in
It is enforced by the algorithm of constructing the routing table,
that path from A to B is
SLIDE 9 Idea of Structrued Overlay Networks and Key-Based Routing, ctd.
Destination's nodeID is used as a key (or address). Each relay node selects from it's routing table a neighbour whose nodeID shares the longest prefix with key and forwards the message to it. The picture shows example routing from 5230 to 8954 (for some reason authors decided to write all numbers reversed in their paper, just read them backwards).
It's just an example of idea, not a Cashmere design.
SLIDE 10 How Cashmere use it:
Instead of relay nodes, Cashmere relies on relay groups.
A relay group is a sub-set of nodes, that share the same m-bit prefix of their nodeID (1< m < k) further called groupID.
For each prefix of any length to any existing nodeID, a pair of public/private keys is generated.
User wishing to join network gets these data from CA (all signed by CA):
Unique k-bit nodeID, and number m - length of groupID prefix.
k pairs of private/public keys for all prefixes to nodeID (prefix keys)
All public prefix keys.
SLIDE 11 Key-Based Routing with relay groups
groupID instead of nodeID is used as the key in
Key-Based Routing procedure.
As the message is routed through network, the first
node that receives the message and shares the groupID prefix with message key acts as a group representant, and processes the message on behalf
- f the relay group. (later about processing)
Therefore we can consider passing a message
through relay groups as a sequence of anycasts to members of another group.
SLIDE 12 Transmission protocol:
A transmits message M to B.
A generates a random sequence of m-bit groupIDs that
contains groupID of B (at any position!), the sequence identifies L relay groups. We will further call these relay groups P1, P2,... ,PL.
A encrypts forwarding path in multiple layers. Then A
encrypts (independently) M's payload (details later) and anycasts the obtained package to P1.
When a node N matching current prefix Pi receive a
package, it acts on behalf it's relay group: it decrypts
- uter layer using group's private key revealing Pi+1's ID,
and anycasts it further. N also multicast current payload to all nodes in it's group.
SLIDE 13 Transmission protocol:
how A encrypts the message payload
It's required that only B can successfully decrypt M, and that it receives it no matter which node will act as B's relay group representant.
Furthermore, current payload need to vary in each hop, because
- therwise it could mark the routing path.
Therefore Cashmere create a relay group Pi's payload as follows: Ri is a symmetric key generated by source for each relay group Pi.
Obviously, for B in relay group Pdst, Payloaddst is generated first, and induces other indices.
SLIDE 14 Transmission protocol:
how A encrypts forwarding path
Forwarding path carries symmetrical keys Ri
generated by the source, successing groupID and path suffix or termination symbol.
For optimisations, definition of Pathi will be
extended, details later.
SLIDE 15
Transmission protocol:
example: B is in P2, L = 4, 12302 acts as 123** representant
SLIDE 16 Transmission protocol:
Optimisations
Because Pathi and Payloadi are decoupled, Pathi can be stored for further usage by the nodes. Each node caches map Pathi <Path ↔
i+1, Pi+1, Ri>
In order to avoid asymmetric encrypt/decrypt operations on each data, source A while computing Path, creates a series of additionala values K1, K2, …, KL, where: for B in relay group Pd, and all other Ki are random. ”|” is contactenation, and FLAG means ”yes, you're a destination node, and prefix is a symmetric key”. So now, recipe for Pathi is: and the representant of relay group multicasts Ki along with Payloadi.
SLIDE 17
Transmission protocol:
the big picture
SLIDE 18 Quickly about Reply Address
(no new ideas here)
If sender A wishes to receive a feedback from B (either ACK or a real message with it's payload) it basically constructs a reply path in the same manner as forwarding path: A generates a sequence of P1' , P2' , ... , PL' and ensures, that it's relay group is on the list. Then: where ki', Ri' are selected random, and send it to B along with
SLIDE 19 Attack model
Let f be the fraction of all nodes the attacker
- controls. Controlled nodes leak all information
(including private/public keys) immediately to attacker.
Furthermore, attacker can listen to either all or
some part of the traffic in the network. Each time any estimations will be given, this figure will be specified.
SLIDE 20 Anonymity Measurment:
entrophy
Let:
f – number of compromised nodes
N – number of all nodes
q – average relay group size (q = N/2^m)
L – length of the path
– set of nodes Ω
pu – probability of node u from being a source or destination of a Ω message. Then, entrophy is defined as follows:
SLIDE 21 Anonymity Measurement:
anonymity of the system
Intuitively, the ideal anonymity is situation, when
all nodes seems to be equally probable as source
- r destination of a message:
Then, optimal entrophy is: Therefore, the measurement of anontymity of a system, is:
SLIDE 22 Anonymity Measurement:
unlinkability in terms of entrophy, details
In this measurement, authors assumed that:
attacker knows the exact number L attacker knows only compromised nodes' traffic
Results are compared to traditional Chaum-Mixes.
SLIDE 23
Anonymity Measurement:
unlinkability in terms of entrophy, image
SLIDE 24
Anonymity Measurement:
unlinkability in terms of entrophy, conclusions
From that (and some ommited in this presentation calculations) authors conclude, that Cashmere has similar unlinkability level to Chaum-Mixes, in spite of a necessity to multicast data among members of relay groups. Authors also claim that the anonymity level is independent from network size, and increasing number of nodes from 20K to 2M resulted in less than 3% variation in (enthropy based) unlinkability. Furthermore, they claim that reducing size of network to 64 while f, L (path length) and q (average group size) remain the same provides similar level of anonymity. This sounds reasonable considering the fact, that L and q determine the number nodes taking part in communication, and f is the fraction of compromised nodes.
SLIDE 25 Anonymity Measurement:
source anonymity, anonymous messages
In this measurement authors assume that:
Destination is controlled by attacker. Cashmere uses one-way communication (no
ReplyAddrInfo).
Attacker does not analyse traffic beteween non-
colluded nodes.
SLIDE 26
Anonymity Measurement:
source anonymity, anonymous messages, images
SLIDE 27
Anonymity Measurement:
source anonymity, anonymous messages, images ctd. And with two-way communication (ReplyAddrInfo present)
SLIDE 28
Anonymity Measurement:
source anonymity, anonymous messages
Authors conclude, that two-way communication decreases the level of anonymity. Nevertheless, this effect can balanced by increasing length of paths, trading performance for anonymity. Authors also claim that they analysed impact of network size on source anonymity, and found it has no significant impact. Unfortunately they provide no specific test results or calculations to support this claim.
SLIDE 29 Anonymity Measurement:
traffic analysis
Assumptions are as previously, except now attacker can intercept a fraction of traffic (figure T. A.) Please note, that situation of T.A. over 90% is highly unrealistic, and yet still provide some level of anonymity.
SLIDE 30 Anonymity Measurement:
traffic analysis, ctd
Authors argue, that situation of intercepting high percentage of traffic in a distribued system is unrealistic, and therefore Cashmere's anonymity are quite satisfactionary. Furthermore they propose a modification of protocol in a following way: if each node exchange symmetric keys with all it's neigbours (members of it's routing table) and encrypt all messages, the analysis of traffic would become
- inefficient. There are however no test results or
calculations supporting this claim in the paper.
SLIDE 31 Briefly on resilience and fault tolerance
Compared to classical approach, Cashmere takes advantage of sharing responsibility between nodes in relay group. A path remain active as long as each of relay groups on it contains at least one live node. Methodology of test is as follows: the number of nodes is constant, number
- f nodes joining and failing is equal. Session time of node is
exponentailly distribued, meaning that both node failures and arrivals are Poisson processes. They assumed a mean time between failures is 200 minutes, and mean time to repair is 5 minutes. The measurements has been made using Kazaa p2p exchange program, simulating 100K exchanges of files sizing between 10 and 100 MB.
SLIDE 32
Briefly on resilience and fault tolerance
SLIDE 33 Brielfy on performance in terms of communication overhead
Comunication costs to maintain knowledge of candidate relay nodes: Authors compare Cashmere with node-based relays, and state that because
- f size of Cashmere's routing table and it's dependence on the underlying
SON (structural overlay network) Cashmere node uses O(N log N) traffic to keep it's local routing information up-to-date. They oppose it to O(N^2) cost of full information, no matter how unlikely it is for any anonymizing network to even attempt to maintain such. Furthermore they ignore costs of existence of SON, so entire comparsion is at least controversial.
Cost of delivering a message is O(qL) compared to O(L) in node-based relays, because of multicast within relay groups.
SLIDE 34 Brielfy on performance in terms of computation time overhead
Authors compare their implementation with program called Pastry, on top
- f which Cashmere was implemented for testing.
Authors argue, that because of increased path durations, the average
- verhead per-session in Cashmere is significantly lower than in node-
based approach. For q = 4, they found calculations necessary to maintain live path to take 5.37% of time taken in classical, node-based approach. Authors claim, that the average aggregated cost of encryption within a relay group is 46.83% of the cost at intermediate nodes in node-based
- solutions. The saving are owed to optimizations mentioned earlier,
namely caching partial information about paths in nodes, and using symmetric keys instead of non-symmetric in payload encryption.
SLIDE 35 Notice on omitted information
This presentation, due to it's time limitation, omitts number of information on test results and methodology, hardware used in testing, as well as authors plans for further experiments and
- development. These information are available in
- riginal paper.
SLIDE 36
Questions?