Case study: d60 Raptor smartAdvisor
Jan Neerbek Alexandra Institute
smartAdvisor Jan Neerbek Alexandra Institute Agenda d60: A - - PowerPoint PPT Presentation
Case study: d60 Raptor smartAdvisor Jan Neerbek Alexandra Institute Agenda d60: A cloud/data mining case Cloud Data Mining Market Basket Analysis Large data sets Our solution 2 Alexandra Institute The Alexandra Institute is
Jan Neerbek Alexandra Institute
2
3
4
5
Internet Webshops Log shopping patterns Do data mining Product Recommendations
6
7
8
Node Node Node Node Node Node Node
9
Node Node Node Node Node Node Node Data layer service Messaging Service
10
Data layer service
11
Messaging Service
12
13
[about.com/wikipedia.org]
14
Customer1 Avocado Milk Butter Potatoes Customer2 Milk Diapers Avocado Beer Customer3 Beef Lemons Beer Chips Customer4 Cereal Beer Beef Diapers
15
Customer1 Avocado Milk Butter Potatoes Customer2 Milk Diapers Avocado Beer Customer3 Beef Lemons Beer Chips Customer4 Cereal Beer Beef Diapers
16
17
Avocado Butter Milk Potatoes Customer1 Avocado Milk Butter Potatoes
18
Customer2 Milk Diapers Avocado Beer Avocado Butter Milk Potatoes
19
Customer2 Milk Diapers Avocado Beer Avocado Butter Milk Potatoes Beer Diapers Milk
20
Customer2 Milk Diapers Avocado Beer Avocado Butter Milk Potatoes Beer Diapers Milk
21
Avocado Butter Milk Potatoes Beer Diapers Milk Beef Beer Chips Lemon Cereal Diapers
22
Grows the frequent itemsets, recusively FP-growth(FP-tree tree) { … for-each (item in tree) count =CountOccur(tree,item); if (IsFrequent(count)) { OutputSet(item); sub = tree.GetTree(tree, item); FP-growth(sub); }
23
Avocado Butter Milk Potatoes Beer Diapers Milk Beef Beer Chips Lemon Cereal Diapers
24
Avocado Butter Milk Potatoes Beer Diapers Milk Beef Beer Chips Lemon Cereal Diapers
25
Avocado Butter Milk Potatoes Beer Diapers Milk Beef Beer Chips Lemon Cereal Diapers Avocado Butter Beer Diapers
26
27
CPU
Memory
CPU
Memory
CPU
Memory
CPU
Memory
CPU
Memory
Shared Memory Network
28
29
30
Avocado Butter Milk Potatoes Beer Diapers Milk Beef Beer Chips Lemon Cereal Diapers Avocado Butter Beer Diapers
31
Milk Butter, Milk
Avocado Butter Beer Diapers Avocado Avocado Beer
Diapers,Milk
These are postfix paths
32
33
Transactions Items
34
FP-growth(FP-tree tree) { … for-each (item in tree) count =CountOccur(tree,item); if (IsFrequent(count)) { OutputSet(item); sub = tree.GetTree(tree, item); FP-growth(sub); }
Replaced with postfix Done in parallel Done in parallel Done in parallel
35
Node Node Node Node Data layer
36
Node Node Node Node Data layer
MQ
37
Distribute buckets Count items (with postfix size=n) Collect counts (per postfix) Call recursive Standard FP-growth
38
Distribute buckets Count items (with postfix size=n) Collect counts (per postfix) Call recursive Standard FP-growth
39
40
41
00:00:00 00:30:00 01:00:00 01:30:00 02:00:00 02:30:00 03:00:00 03:30:00 04:00:00 04:30:00 1 2 4 8
Message-driven FP-growth FP-growth Total node time
42