Full Bayesian Network Classifiers by Jiang Su and Harry Zhang - PowerPoint PPT Presentation
Full Bayesian Network Classifiers by Jiang Su and Harry Zhang Flemming Jensen November 2008 Purpose To introduce the full Bayesian network classifier(FBC). Introduction Bayesian networks are often used for the classification problem, where a
Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X
Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X - Add all the variables X j with W ( X j ) > W ( X i ) to the parent set Π X i of X i .
Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X - Add all the variables X j with W ( X j ) > W ( X i ) to the parent set Π X i of X i . - Add arcs from all the variables X j in Π X i to X i .
Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X - Add all the variables X j with W ( X j ) > W ( X i ) to the parent set Π X i of X i . - Add arcs from all the variables X j in Π X i to X i . Add the resulting network B c to B .
Structure Learning Algorithm Algorithm FBC-Structure( S , X ) 1 B = empty. 2 Partition the training data S into | C | subsets S c by the class value c . 3 For each training data set S c Compute the mutual information M ( X i ; X j ) and the dependency threshold φ ( X i , X j ) between each pair of variables X i and X j . Compute W ( X i ) for each variable X i . For all variables X i in X - Add all the variables X j with W ( X j ) > W ( X i ) to the parent set Π X i of X i . - Add arcs from all the variables X j in Π X i to X i . Add the resulting network B c to B . 4 Return B .
Example - Structure Learning Algorithm Example using 1000 labeled instances, where C is the class variable and A , B , and D are feature variables. C A B D # C A B D # c 1 a 1 b 1 d 1 11 c 2 a 1 b 1 d 1 36 c 1 a 1 b 1 d 2 5 c 2 a 1 b 1 d 2 36 7 259 c 1 a 1 b 2 d 1 c 2 a 1 b 2 d 1 17 29 c 1 a 1 b 2 d 2 c 2 a 1 b 2 d 2 227 96 c 1 a 2 b 1 d 1 c 2 a 2 b 1 d 1 97 96 c 1 a 2 b 1 d 2 c 2 a 2 b 1 d 2 11 43 c 1 a 2 b 2 d 1 c 2 a 2 b 2 d 1 25 5 c 1 a 2 b 2 d 2 c 2 a 2 b 2 d 2
Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )
Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )
Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )
Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )
Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )
Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 11+5 7+17 a 1 400 400 227+97 11+25 a 2 400 400 P ( A , B )
Example - Structure Learning Algorithm # C A B D 11 c 1 a 1 b 1 d 1 5 c 1 a 1 b 1 d 2 7 c 1 a 1 b 2 d 1 17 c 1 a 1 b 2 d 2 227 c 1 a 2 b 1 d 1 97 c 1 a 2 b 1 d 2 c 1 a 2 b 2 d 1 11 c 1 a 2 b 2 d 2 25 The 400 data instances where C = c 1 . b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B )
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 (0.04 + 0.06) · (0.04 + 0.81) (0.04 + 0.06) · (0.06 + 0.09) a 2 (0.81 + 0.09) · (0.04 + 0.81) (0.81 + 0.09) · (0.06 + 0.09) P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B )= 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085)+0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) +0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) + 0 . 06 · log ( 0 . 06 0 . 015)+0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) + 0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135)= 0 . 027
Example - Structure Learning Algorithm b 1 b 2 a 1 0.04 0.06 a 2 0.81 0.09 P ( A , B ) b 1 b 2 a 1 0.085 0.015 a 2 0.765 0.135 P ( A ) P ( B ) P ( x , y ) log P ( x , y ) � M ( X ; Y ) = P ( x ) P ( y ) x ∈ X , y ∈ Y M ( A ; B ) = 0 . 04 · log ( 0 . 04 0 . 085) + 0 . 81 · log ( 0 . 81 0 . 765) + 0 . 06 · log ( 0 . 06 0 . 015) + 0 . 09 · log ( 0 . 09 0 . 135) = 0 . 027
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 800
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D ) = 0 . 045
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D ) = 0 . 045 indent indent indent W ( D )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D ) = 0 . 045 indent indent indent W ( D ) = M ( B ; D )
Example - Structure Learning Algorithm Mutual information M ( A ; B ) = 0 . 027 M ( A ; D ) = 0 . 004 M ( B ; D ) = 0 . 018 Dependency threshold φ ( X i , X j ) = logN 2 N × T ij φ ( A , B ) = φ ( A , D ) = φ ( B , D ) = 4 log 400 = 0 . 013 800 Total influence M ( X i ; X j ) >φ ( X i , X j ) � W ( X i ) = M ( X i ; X j ) j ( j � = i ) indent indent indent W ( A ) = M ( A ; B ) = 0 . 027 indent indent indent W ( B ) = M ( A ; B ) + M ( B ; D ) = 0 . 045 indent indent indent W ( D ) = M ( B ; D ) = 0 . 018
B A D Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values:
B A D Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018
B A D Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D )
Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.
Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.
Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.
Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.
Example - Structure Learning Algorithm We now construct a full Bayesian network with variable order according to the total influence values: W ( A ) = 0 . 027 W ( B ) = 0 . 045 W ( D ) = 0 . 018 W ( B ) > W ( A ) > W ( D ) B A D We now have the full Bayesian network B c 1 , which is the part of the multinet that corresponds to C = c 1 . We should now repeat the process to construct B c 2 and thereby complete the FBC structure learning.
CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN.
CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN. A traditional decision tree learning algorithm, such as C4.5, can be used to learn CPT-trees. However, since the time complexity typically is O ( n 2 · N ) the resulting FBC learning algorithm would have a complexity of O ( n 3 · N ).
CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN. A traditional decision tree learning algorithm, such as C4.5, can be used to learn CPT-trees. However, since the time complexity typically is O ( n 2 · N ) the resulting FBC learning algorithm would have a complexity of O ( n 3 · N ). Instead a fast decision tree learning algorithm is purposed.
CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN. A traditional decision tree learning algorithm, such as C4.5, can be used to learn CPT-trees. However, since the time complexity typically is O ( n 2 · N ) the resulting FBC learning algorithm would have a complexity of O ( n 3 · N ). Instead a fast decision tree learning algorithm is purposed. The algorithm uses the mutual information to determine a fixed ordering of variables from root to leaves.
CPT-tree Learning We now need to learn a CPT-tree for each variable in the full BN. A traditional decision tree learning algorithm, such as C4.5, can be used to learn CPT-trees. However, since the time complexity typically is O ( n 2 · N ) the resulting FBC learning algorithm would have a complexity of O ( n 3 · N ). Instead a fast decision tree learning algorithm is purposed. The algorithm uses the mutual information to determine a fixed ordering of variables from root to leaves. The predefined variable ordering makes the algorithm faster than traditional decision tree learning algorithms.
CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S )
CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T .
CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T . 2 If ( S is pure or empty) or (Π X i is empty) Return T .
CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T . 2 If ( S is pure or empty) or (Π X i is empty) Return T . 3 qualified = False .
CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T . 2 If ( S is pure or empty) or (Π X i is empty) Return T . 3 qualified = False . 4 While ( qualified == False ) and (Π X i is not empty)
CPT-tree Learning Algorithm Algorithm Fast-CPT-Tree(Π X i , S ) 1 Create an empty tree T . 2 If ( S is pure or empty) or (Π X i is empty) Return T . 3 qualified = False . 4 While ( qualified == False ) and (Π X i is not empty) Choose the variable X j with the highest M ( X j ; X i ).
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.