Student: Manuel Martín Salvador Supervisors: Luis M. de Campos and Silvia Acid
Master in Soft Computing and Intelligent Systems Department of Computer Science and Artificial Intelligence University of Granada
Handling concept drift in data stream mining Student : Manuel Martn - - PowerPoint PPT Presentation
Handling concept drift in data stream mining Student : Manuel Martn Salvador Supervisors : Luis M. de Campos and Silvia Acid Master in Soft Computing and Intelligent Systems Department of Computer Science and Artificial Intelligence University
Master in Soft Computing and Intelligent Systems Department of Computer Science and Artificial Intelligence University of Granada
3
4
1 , a 2 , …, a n , c)
Image: I. Žliobaitė thesis
5
S1 to another П
6
Image: I. Žliobaitė thesis
7
Image: D. Brzeziński thesis
8
Image: D. Brzeziński thesis
9
color=red and size=small color=green
shape=cricle size=medium
size=large Class=true if →
Image: Kolter & Maloof
10
11
12
13
14
Advantages: All instances are used for training. Useful for data streams with concept drifts.
…... Sliding window:
errors processed instances
…... Fading factor:
currentError⋅errors 1⋅processed instances errorsinside window window size
15
16
17
18
✔ Advantages: can be used by any classification algorithm. ✗ Disadvantages: usually, once detected a change, they discard the old model and relearn a new one.
19
✔ Advantages: can be used by any classification algorithm. ✗ Disadvantages: usually, once detected a change, they discard the old model and relearn a new one. ✔ Advantages: they continually adapt the model over time ✗ Disadvantages: they don't detect changes.
20
– Heuristic 1 – Heuristic 2 – Hybrid heuristic: 1+2
21
i , e i+1 , …, e i+n} (i.e. 0,0,1,1)
i - 1 < c i (more errors) → declines++
i - 1 > c i (less errors) → declines=0
i - 1 = c i (same) → declines don't change
22
23
History = 8 Warning = 2 Drift = 4 Detected drifts: 46 y 88 Distance to real drifts: 46-40 = 6 88-80 = 8
24
25
History = 4 Warning = 4 Drift = 8 Detected drifts: 52 y 90 Distance to real drifts: 52-40 = 12 90-80 = 10
26
Goal: to smooth accuracy rates for better detection.
27
28
Smooth = 32 Warning = 4 Drift = 8 Detected drifts: 49 y 91 Distance to real drifts: 49-40 = 9 91-80 = 11
29
30
Smooth = 4 History = 32 Warning = 2% Drift = 4% Detected drifts: 44 y 87 Distance to real drifts: 44-40 = 4 87-80 = 7
31
1 or Warning 2 → enable Warning
1 or Drift 2 → enable Drift
32
http://moa.cs.waikato.ac.nz
33
34
35
No detection MovingAverage1 MoreErrorsMoving MovingAverage2 MaxMoving MovingAverageH DDM EDDM
36
37
38
39
40
41
42
changes
43
44
45