Belief Propagation Probabilistic Graphical Models Sharif University - - PowerPoint PPT Presentation

belief propagation
SMART_READER_LITE
LIVE PREVIEW

Belief Propagation Probabilistic Graphical Models Sharif University - - PowerPoint PPT Presentation

Sum-Product: Message Passing Belief Propagation Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani All single-node marginals If we need the full set of marginals, repeating elimination algorithm for each


slide-1
SLIDE 1

Sum-Product: Message Passing Belief Propagation

Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani

slide-2
SLIDE 2

All single-node marginals

2

 If we need the full set of marginals, repeating elimination

algorithm for each individual variable is wasteful

 It does not share intermediate terms

 Message-passing algorithms on graphs (messages are

the shared intermediate terms).

 sum-product and junction tree  upon convergence of the algorithms, we obtain marginal

probabilities for all cliques of the original graph.

slide-3
SLIDE 3

Tree

3

 Sum-product work only in trees (and we will see it also

work on tree-like graphs)

Directed tree All nodes have one parent expect to the root Undirected tree A unique path between any pair of nodes

slide-4
SLIDE 4

Parameterization

4

 Consider a tree 𝒰(𝒲, ℰ)  Potential functions: 𝜚 𝑦𝑗 , 𝜚(𝑦𝑗, 𝑦𝑘)

𝑄 𝒚 = 1 𝑎

𝑗∈𝒲

𝜚 𝑦𝑗

𝑗,𝑘 ∈ℰ

𝜚 𝑦𝑗, 𝑦𝑘

 In directed graphs:

 𝜚 𝑦𝑠 = 𝑄(𝑦𝑠), ∀𝑗 ≠ 𝑠, 𝜚 𝑦𝑗 = 1  𝜚 𝑦𝑗, 𝑦𝑘 = 𝑄(𝑦𝑘|𝑦𝑗) (𝑦𝑗 is the parent of 𝑦𝑘)  𝑎 = 1

 When we have evidence on variable 𝑦𝑗 as 𝑦𝑗 =

𝑦𝑗 we replace 𝑦𝑗 in all factors in which it appears by 𝑦𝑗

𝑄 𝒚 = 𝑄(𝑦𝑠)

𝑗,𝑘 ∈ℰ

𝑄 𝑦𝑘|𝑦𝑗

slide-5
SLIDE 5

Sum-product: elimination view

5

 Query node 𝑠  Elimination order: inverse of the topological order

 Starts from leaves and generates elimination cliques of size at

most two

 Elimination of each node can be considered as message-

passing (or Belief Propagation):

 Elimination on trees is equivalent to message passing along tree

branches

 Instead of the node elimination, we preserve the node and

compute a message from it to its parent

 This message is equivalent to the factor resulted from the elimination

  • f that node and all of the nodes in its subtree
slide-6
SLIDE 6

Messages

6

Message that 𝑘 sends to 𝑗 … root

slide-7
SLIDE 7

Messages on a tree

7

 Messages can be reused to find probabilities on different

query variables.

 Messages on the tree provide a data structure for caching

computations.

𝑌1 𝑌2 𝑌3 𝑌4 𝑌5 We need 𝑛32(𝑦2) to find both 𝑄(𝑌1) and 𝑄(𝑌2)

slide-8
SLIDE 8

Messages and marginal distribution

8

Message that X𝑘 sends to 𝑌𝑗 𝑛𝑘𝑗 𝑦𝑗 =

𝑦𝑘

𝜚 𝑦𝑘 𝜚 𝑦𝑗, 𝑦𝑘

𝑙∈𝒪(𝑘)\𝑗

𝑛𝑙𝑘(𝑦𝑘) 𝑞 𝑦𝑠 ∝ 𝜚 𝑦𝑠

𝑙∈𝒪(𝑠)

𝑛𝑙𝑠(𝑦𝑠)

a function of only 𝑦𝑗

slide-9
SLIDE 9

Messages and marginal: Example

9

𝑞 𝑦2 ∝ 𝜚 𝑦2 𝑛12(𝑦2)𝑛32(𝑦2)𝑛42(𝑦2) 𝑛12 𝑦2 =

𝑦1

𝜚 𝑦1 𝜚 𝑦1, 𝑦2

slide-10
SLIDE 10

Computing all node marginals

10

 We can compute over all possible elimination order

(generating only elimination cliques of size 2) by only computing all possible messages (2 ℰ )

 T

  • allow all nodes can be the root, we just need to compute

2 ℰ messages

 Messages can be reused

 Instead of running the elimination algorithm 𝑂 times

 Dynamic programming approach

 2-Pass algorithm that saves and uses messages

 A pair of messages (one for each direction) have been computed for

each edge

slide-11
SLIDE 11

Messages required to compute all node marginals

11

slide-12
SLIDE 12

A two-pass message-passing schedule

12

 Arbitrarily pick a node as the root

 First pass: starting at the leaves and proceeds inward

 each node passes a message to its parent.  continues until the root has obtained messages from all of its

adjoining nodes.

 Second pass: starting at the root and passing the messages back

  • ut

 messages are passed in the reverse direction.  continues until all leaves have received their messages.

slide-13
SLIDE 13

Asynchronous two-pass message-passing

13

First pass: upward Second pass: downward

slide-14
SLIDE 14

Sum-product algorithm: example

14

𝑛21(𝑦1) 𝑛21(𝑦1)

slide-15
SLIDE 15

Sum-product algorithm: example

15

𝑛21(𝑦1)

slide-16
SLIDE 16

Parallel message-passing

16

 Message-passing protocol: a node can send a message to a

neighboring node when and only when it has received messages from all of its other neighbors

 Correctness of parallel message-passing on trees

 The synchronous implementation is “non-blocking”  Theorem:

The message-passing guarantees

  • btaining

all marginals in the tree

slide-17
SLIDE 17

Parallel message passing: Example

17

slide-18
SLIDE 18

Tree-like graphs

18

 Sum-product message passing idea can also be extended

to work in tree-like graphs (e.g., polytrees) too.

 Although the undirected marginalized graphs resulted

from polytrees are not tree, the corresponding factor graph is a tree

Polytree Nodes can have multiple parents Moralized graph Factor graph

slide-19
SLIDE 19

Recall: Factor graph

19

𝜚 𝑦1, 𝑦2, 𝑦3 = 𝑔

𝑏(𝑦1, 𝑦2)𝑔 𝑐(𝑦1, 𝑦3)𝑔 𝑑(𝑦2, 𝑦3)

𝜚 𝑦1, 𝑦2, 𝑦3 = 𝑔 𝑦1, 𝑦2, 𝑦3

slide-20
SLIDE 20

Sum-product on factor trees

20

 Factor tree: a factor graph with no loop  Two types of messages:

 Message that flows from variable node 𝑗 to

factor node 𝑡: 𝑤𝑗𝑡 𝑦𝑗 =

𝑢∈𝒪 𝑗 −{s}

𝜈𝑢𝑗(𝑦𝑗)

 Message that flows from factor node 𝑡 to

variable node 𝑗: 𝜈𝑡𝑗 𝑦𝑗 =

𝒚𝒪 𝑡 −{𝑗}

𝑔

𝑡 𝒚𝒪(𝑡) 𝑘∈𝒪 𝑡 −{𝑗}

𝑤𝑘𝑡(𝑦𝑘)

slide-21
SLIDE 21

Sum-product on factor trees

21

 The introduced message-passing schedule for trees can

also be used on factor trees

 When the messages from all the neighbors of a node is

received, the marginal probability will be: 𝑄 𝑦𝑗 ∝

𝑡∈𝒪 𝑗

𝜈𝑡𝑗 𝑦𝑗 𝑄 𝑦𝑗 ∝ 𝑤𝑗𝑡(𝑦𝑗)𝜈𝑡𝑗(𝑦𝑗)

𝑡 ∈ 𝒪 𝑗 𝑡 is a factor node that is neighbor of 𝑌𝑗

slide-22
SLIDE 22

The relation between sum-product on factor trees and sum-product on undirected trees

22

 Relation of 𝑛 messages of sum-product algorithm for undirected

trees and 𝜈 messages of sum-product algorithm for factor trees

𝜈𝑡𝑗 𝑦𝑗 =

𝒚𝒪 𝑡 −{𝑗}

𝑔

𝑡 𝒚𝒪(𝑡) 𝑘∈𝒪 𝑡 −{𝑗}

𝑤𝑘𝑡(𝑦𝑘) =

𝑦𝑘

𝜚(𝑦𝑗, 𝑦𝑘)𝑤𝑘𝑡(𝑦𝑘) =

𝑦𝑘

𝜚(𝑦𝑗, 𝑦𝑘)

𝑢∈𝒪 𝑘 −{s}

𝜈𝑢𝑘(𝑦𝑘) =

𝑦𝑘

𝜚(𝑦𝑗)𝜚(𝑦𝑗, 𝑦𝑘)

𝑢∈𝒪′ 𝑘 −{s}

𝜈𝑢𝑘(𝑦𝑘)

𝒪′ 𝑘 = 𝒪 𝑘 − {factor corresponding to 𝜚(𝑦𝑘)}

slide-23
SLIDE 23

23

Example

slide-24
SLIDE 24

References

24

 D. Koller and N. Friedman, “Probabilistic Graphical Models:

Principles and Techniques”, MIT Press, 2009, Chapter 10.

 M.I. Jordan, “An

Introduction to Probabilistic Graphical Models”, Chapter 4.