SLIDE 11 Impurity Impurity
Very impure group Less impure Minimum impurity mp Minimum impurity
41
An example of entropy disc.
Test split temp < 71.5
yes no < 71.5 4 2 Temp. Play? 64 Yes 65 No
2 log 2 4 log 4 6 ) 5 71 ( split Ent
> 71.5 5 3 68 Yes 69 Yes 70 Yes
(4 yes, 2 no)
939 . 3 log 3 5 log 5 8 6 log 6 6 log 6 14 ) 5 . 71 ( split Ent
71 No 72 No 72 Yes
939 . 8 log 8 8 log 8 14
yes no 72 Yes 75 Yes 75 Yes 80 No
(5 yes, 3 no)
y s no < 77 7 3 > 77 2 2 80 No 81 Yes 83 Yes 85 No
10 3 log 10 3 10 7 log 10 7 14 10 ) 77 ( split Ent
42
85 No
915 . 4 2 log 4 2 4 2 log 4 2 14 4
An example (cont.)
Temp. Play? 64 Yes
6th split
The method tests all split possibilities and chooses
65 No 68 Yes 69 Yes
p
possibilities and chooses the split with smallest entropy.
69 Yes 70 Yes 71 No 72 No
4 h li 5th split
In the first iteration a split at 84 is chosen.
72 No 72 Yes 75 Yes 75 Yes
3rd split 4th split
The two resulting branches are processed recursively.
75 Yes 80 No 81 Yes 83 Yes
2nd split
The fact that recursion l i th fi t
83 Yes 85 No
1st split
interval in this example is an artifact. In general both intervals have to be
43
both intervals have to be split.
The stopping criterion The stopping criterion
Previous slide did not take into account the stopping criterion
Ent S E T S ( ) ( , )
Previous slide did not take into account the stopping criterion.
N S T N N ) , ( ) 1 log( N N
)] ( ) ( ) ( [ ) 2 3 ( log ) ( S Ent c S Ent c S cEnt T S
c
)] ( ) ( ) ( [ ) 2 3 ( log ) , (
2 2 1 1 2
S Ent c S Ent c S cEnt T S c is the number of classes in S c is the number of classes in S c1 is the number of classes in S1 c2 is the number of classes in S2.
44
2 2
This is called the Minimum Description Length Principle (MDLP)