Information Replication Strategy in Unstructured Peer-to-Peer - - PowerPoint PPT Presentation

information replication strategy in unstructured peer to
SMART_READER_LITE
LIVE PREVIEW

Information Replication Strategy in Unstructured Peer-to-Peer - - PowerPoint PPT Presentation

Introduction System design Preliminary results Conclusion Information Replication Strategy in Unstructured Peer-to-Peer Networks Using Thematic Agents Nicolas Bonnel, Gildas Mnier, Pierre-francois Marteau Laboratoire Valoria - Universit


slide-1
SLIDE 1

Introduction System design Preliminary results Conclusion

Information Replication Strategy in Unstructured Peer-to-Peer Networks Using Thematic Agents

Nicolas Bonnel, Gildas Ménier, Pierre-francois Marteau Laboratoire Valoria - Université de Bretagne Sud October 24, 2007

1 / 20

slide-2
SLIDE 2

Introduction System design Preliminary results Conclusion Overview P2P architecture

1

Introduction Overview P2P architecture

2

System design

3

Preliminary results

4

Conclusion

2 / 20

slide-3
SLIDE 3

Introduction System design Preliminary results Conclusion Overview P2P architecture

Overview

Context Indexing very large databases The system constrains the location and replication of data Peer to Peer architecture Fault tolerance Scalability Resources scavenging Allow to use more computers, cheap cost Ex : SETI

3 / 20

slide-4
SLIDE 4

Introduction System design Preliminary results Conclusion Overview P2P architecture

Structured p2p network

Chord, CAN, Tapestry, ... Characteristic Constrain on data location (distributed hash function) Features Easy to retrieve rare items Approximatives and ranged queries very costly Load balancing problems

4 / 20

slide-5
SLIDE 5

Introduction System design Preliminary results Conclusion Overview P2P architecture

Untructured p2p network

Gnutella [Clip2, 2002], ... Characteristic No constrain on data location Features Highly replicated items can be retrieved at a cheap cost Can control data placement Very costly to retrieve rare items

5 / 20

slide-6
SLIDE 6

Introduction System design Preliminary results Conclusion

1

Introduction Overview P2P architecture

2

System design

3

Preliminary results

4

Conclusion

6 / 20

slide-7
SLIDE 7

Introduction System design Preliminary results Conclusion

System design

Architecture Index documents, distributed index database (each node host a part) Unstructured peer-to-peer architecture Nodes have a summary of the keywords they host (Bloom filter) This summary allows to speed up query forwarding Replication on nodes with similar summary

7 / 20

slide-8
SLIDE 8

Introduction System design Preliminary results Conclusion

Bloom Filters [Bloom, 70]

Definition A array of m bits. hi : 0 <= i < k k hash functions. insert(x) : ∀i : A[hi(x)] = 1 query(x) : true if ∀i : A[hi(x)] == 1 False positives False positives are possible, but false negatives are not Probability of false positive : (1 − (1 − 1

m)kn)k

8 / 20

slide-9
SLIDE 9

Introduction System design Preliminary results Conclusion

Replication strategy

Agent behavior Agents control the number of replica for each data in the network An agent carry a keyword k (theme) and related indexed information Agents move randomly on the network It can create or delete replica according to its local knowledge Each step, small probability to have a new theme

9 / 20

slide-10
SLIDE 10

Introduction System design Preliminary results Conclusion

Replication strategy

Agent behavior Each time it visits a node Nc, the agent computes a score φ(k, Nc) = S(k,Nl)

S(k,Nc) × f (k) α

Nl is the node where the agent has taken it’s theme S(k, N) : scoring function for a node N for the keyword k Measures a trade off between the space available and the degree of matching of k to the node Bloom filter f (k) : frequence of last nodes visited hosting k α : constant that tunes the replication amount to achieve

10 / 20

slide-11
SLIDE 11

Introduction System design Preliminary results Conclusion

Replication strategy

Agent behavior Replicating bound τinf and Deleting bound τsup

τinf +τsup 2

= 1 If φ(k, Nc) ≤ τinf , creation of a replica for k on the local node Nc If φ(k, Nc) ≥ τsup, all indexed information for k is removed from the local node Nc Network with m nodes :

m 100 × α average number of replicas

for each data

11 / 20

slide-12
SLIDE 12

Introduction System design Preliminary results Conclusion

1

Introduction Overview P2P architecture

2

System design

3

Preliminary results

4

Conclusion

12 / 20

slide-13
SLIDE 13

Introduction System design Preliminary results Conclusion

Experiments settings

General settings 400 nodes, random graph like topology Node degree between 2 and 8 30 000 documents from Wikipedia Bloom filter’s size : 8192 (213) Number of hash functions : 32 1000 queries generated at random Agents settings 2000 agents 100 nodes recorded Replicating constant : α = 2 Bounds τinf = 0.8, τsup = 1.2

13 / 20

slide-14
SLIDE 14

Introduction System design Preliminary results Conclusion

Preliminary results

Evolution of the number of replicas and filters occupation. Number of replicas : normal distribution centered around 13 Filters occupation increase from 43.5% to 70.9% Filters occupation stable since 5 replicas

14 / 20

slide-15
SLIDE 15

Introduction System design Preliminary results Conclusion

Preliminary results

Random walk in unreplicated and replicated environment. Half queries are answered within 1000 hops without replication Half queries are answered within 50 hops with 13 replicas Results are still good even with the failure of half nodes.

15 / 20

slide-16
SLIDE 16

Introduction System design Preliminary results Conclusion

Preliminary results

Ratio between unreplicated and replicated environment. 22 times faster (in average) to answer between 5% and 50% of queries. The distribution of replicas is homogoneous, as wherever the query is forwarded at random, it still finds a replica of the searched information.

16 / 20

slide-17
SLIDE 17

Introduction System design Preliminary results Conclusion

Preliminary results

Self-healing capacities Failure of half of nodes (ie : memory of those nodes reseted) Average number of replicas drops to 6.5 Information lost : 0.036% Then the number of replicas grows like in the first figure

17 / 20

slide-18
SLIDE 18

Introduction System design Preliminary results Conclusion

1

Introduction Overview P2P architecture

2

System design

3

Preliminary results

4

Conclusion

18 / 20

slide-19
SLIDE 19

Introduction System design Preliminary results Conclusion

Conclusion

Conclusion Information replication with agents Algorithm fully decentralized, scales very well Self healing properties Resilient to hard failures Future Work Larger network More dynamic environement

19 / 20

slide-20
SLIDE 20

Introduction System design Preliminary results Conclusion

References

Clip2. The gnutella protocol specification v0.4, 2002. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7) :422–426, 1970.

20 / 20