Learning chordal Markov networks by dynamic programming Kustaa - PowerPoint PPT Presentation
Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014 Kustaa Kangas Learning chordal Markov networks by dynamic programming Probabilistic graphical
Learning chordal Markov networks by dynamic programming Kustaa Kangas Teppo Niinim¨ aki Mikko Koivisto NIPS 2014 (to appear) November 27, 2014 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Probabilistic graphical models Graphical model ◮ Graph structure G on the vertex set V = { 1 , . . . , n } ◮ Represents conditional independencies in a joint distribution p ( X ) = p ( X 1 , . . . , X n ) Advantages ◮ Easy to read ◮ Compact way to store a distribution ◮ Efficient inference Kustaa Kangas Learning chordal Markov networks by dynamic programming
Probabilistic graphical models Directed models: Bayesian networks, ... Undirected models: Markov networks, ... Structure learning problem : Given samples from p ( X 1 , . . . , X n ), find a model that best fits the sampled data. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Probabilistic graphical models Structure learning in chordal Markov networks : Find a chordal Markov network that maximizes a given decomposable score. Prior work: ◮ Constraint satisfaction, Corander et al. ◮ Integer linear programming, Bartlett and Cussens Our result: Dynamic programming in O (4 n ) time and O (3 n ) space for n variables. ◮ First non-trivial bound ◮ Competitive in practice Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks ◮ Joint distribution p ( X ) = p ( X 1 , . . . X n ) ◮ Undirected graph G on V = { 1 , . . . , n } with the Global Markov property: For A , B , S ⊆ V it holds that X A ⊥ ⊥ X B | X S if S separates A and B in G . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Markov networks If p is strictly positive, it factorizes as � p ( X 1 , . . . , X n ) = ψ C ( X C ) , C ∈ C where ◮ C is the set of (maximal) cliques of G ◮ ψ C are mappings to positive reals ◮ X C = { X v : v ∈ C } (Hammersley–Clifford Theorem) Kustaa Kangas Learning chordal Markov networks by dynamic programming
Bayesian networks ◮ Directed acyclic graph ◮ Conditional independencies by d-separation ◮ Factorizes: n � p ( X 1 , . . . , X n ) = p ( X i | parents ( X i )) i =1 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Bayesian and Markov networks ◮ Bayesian and Markov networks are not equivalent ◮ Chordal Markov networks are the intersection between the two Kustaa Kangas Learning chordal Markov networks by dynamic programming
Chordal graphs ◮ A chord is an edge between two non-consecutive vertices in a cycle. ◮ An graph is chordal or triangulated if every cycle of at least 4 vertices has a chord . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Chordal graphs ◮ A chord is an edge between two non-consecutive vertices in a cycle. ◮ An graph is chordal or triangulated if every cycle of at least 4 vertices has a chord . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 5 9 2 3 6 8 4 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 5 9 2 3 6 8 4 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Running intersection property : For all C 1 , C 2 ∈ C , every clique on the path between C 1 and C 2 contains C 1 ∩ C 2 . Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Separator : Intersection of adjacent cliques in a clique tree. Every clique tree has the same multiset of separators. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Clique tree decomposition 7 1 2 1 7 5 1 2 3 3 5 9 6 9 2 3 2 4 8 6 8 4 8 Theorem: A graph is chordal if and only if it has a clique tree. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Chordal Markov networks 7 1 5 9 2 3 6 8 4 ◮ ψ i ( X C i ) = p ( C i ) / p ( S i ) ◮ Factorization becomes � C ∈ C p ( X C ) � p ( X 1 , . . . , X n ) = ψ C ( X C ) = S ∈ S p ( X S ) , � C ∈ C where C and S are the sets of cliques and separators. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Structure learning Given sampled data D from p ( X 1 , . . . X n ), how well does a graph structure G fit the data? Common scoring criteria decompose as � C ∈ C score ( C ) score ( G ) = � S ∈ S score ( S ) Each score ( C ) is the probability of the data projected to C , possibly extended with a prior or penalization term. e.g. maximum likelihood, Bayesian Dirichlet, ... Kustaa Kangas Learning chordal Markov networks by dynamic programming
Structure learning Structure learning problem in chordal Markov networks: Given score ( C ) for each C ⊆ V , find a chordal graph G that maximizes � C ∈ C score ( C ) score ( G ) = S ∈ S score ( S ) . � We assume each score ( C ) can be efficiently computed and focus on the combinatorial problem. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Structure learning Bruteforce solution: ◮ Enumerate undirected graphs ◮ Determine which are chordal ◮ For each chordal G , find a clique tree to evaluate score ( G ) ◮ O ∗ (2( n 2 )) Kustaa Kangas Learning chordal Markov networks by dynamic programming
Structure learning We denote score ( T ) = score ( G ) when T is a clique tree of G . ◮ Every clique tree T uniquely specifies a chordal graph G . ◮ We can search the space of clique trees instead. Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recursive characterization 7 1 2 1 5 2 3 3 9 6 2 4 8 8 Let T be rooted at C with subtrees T 1 , . . . , T k rooted at C 1 , . . . , C k . Then, k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence For S ⊂ V and ∅ ⊂ R ⊆ V \ S , let f ( S , R ) be the maximum score ( G ) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f ( ∅ , V ). Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence For S ⊂ V and ∅ ⊂ R ⊆ V \ S , let f ( S , R ) be the maximum score ( G ) over chordal G on S ∪ R such that S is a proper subset of a clique. Then, the solution is given by f ( ∅ , V ). k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) . S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C R R R 1 R 3 C C S C S R 2 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence k score ( T i ) � score ( T ) = score ( C ) score ( C ∩ C i ) i =1 k f ( S i ∪ R i ) � f ( R ) = max score ( C ) score ( S i ) ∅ ⊂ C ⊆ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C R R R 1 R 3 C C S C S R 2 Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence k f ( S i , R i ) � f ( S , R ) = max score ( C ) score ( S i ) S ⊂ C ⊆ S ∪ R i =1 { R 1 , . . . , R k } ❁ R \ C S 1 , . . . , S k ⊂ C Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recurrence k f ( S i , R i ) � f ( S , R ) = max score ( C ) max score ( S i ) S ⊂ C ⊆ S ∪ R S i ⊂ C i =1 { R 1 , . . . , R k } ❁ R \ C Kustaa Kangas Learning chordal Markov networks by dynamic programming
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.