[PPT] - CS7015 (Deep Learning) : Lecture 18 Markov Networks Mitesh M. PowerPoint Presentation

SLIDE 1

1/29

CS7015 (Deep Learning) : Lecture 18

Markov Networks Mitesh M. Khapra

Department of Computer Science and Engineering Indian Institute of Technology Madras

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 2

2/29

Acknowledgments Probabilistic Graphical models: Principles and Techniques, Daphne Koller and Nir Friedman

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 3

3/29

Module 18.1: Markov Networks: Motivation

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 4

4/29

To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 5

4/29

D B A C To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 6

4/29

D B A C A, B, C, D are four students To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 7

4/29

D B A C A, B, C, D are four students A and B study together sometimes To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 8

4/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 9

4/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 10

4/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 11

4/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 12

4/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 13

5/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together To motivate undirected graphical models let us consider a new example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 14

5/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together To motivate undirected graphical models let us consider a new example Now suppose there was some miscon- ception in the lecture due to some er- ror made by the teacher

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 15

5/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together To motivate undirected graphical models let us consider a new example Now suppose there was some miscon- ception in the lecture due to some er- ror made by the teacher Each one of A, B, C, D could have in- dependently cleared this misconcep- tion by thinking about it after the lec- ture

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 16

5/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together To motivate undirected graphical models let us consider a new example Now suppose there was some miscon- ception in the lecture due to some er- ror made by the teacher Each one of A, B, C, D could have in- dependently cleared this misconcep- tion by thinking about it after the lec- ture In subsequent study pairs, each stu- dent could then pass on this inform- ation to their partner

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 17

6/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together We are now interested in knowing whether a student still has the mis- conception or not

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 18

6/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together We are now interested in knowing whether a student still has the mis- conception or not Or we are interested in P(A, B, C, D)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 19

6/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together We are now interested in knowing whether a student still has the mis- conception or not Or we are interested in P(A, B, C, D) where A, B, C, D can take values 0 (no misconception) or 1 (misconcep- tion)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 20

6/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together We are now interested in knowing whether a student still has the mis- conception or not Or we are interested in P(A, B, C, D) where A, B, C, D can take values 0 (no misconception) or 1 (misconcep- tion) How do we model this using a Bayesian Network ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 21

7/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together First let us examine the conditional independencies in this problem

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 22

7/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together First let us examine the conditional independencies in this problem A ⊥ C|{B, D} (because A & C never interact)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 23

7/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together First let us examine the conditional independencies in this problem A ⊥ C|{B, D} (because A & C never interact) B ⊥ D|{A, C} (because B & D never interact)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 24

7/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together First let us examine the conditional independencies in this problem A ⊥ C|{B, D} (because A & C never interact) B ⊥ D|{A, C} (because B & D never interact) There are no other conditional inde- pendencies in the problem

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 25

7/29

D B A C A, B, C, D are four students A and B study together sometimes B and C study together sometimes C and D study together sometimes A and D study together sometimes A and C never study together B and D never study together First let us examine the conditional independencies in this problem A ⊥ C|{B, D} (because A & C never interact) B ⊥ D|{A, C} (because B & D never interact) There are no other conditional inde- pendencies in the problem Now let us try to represent this using a Bayesian Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 26

8/29

D B A C How about this one?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 27

8/29

D B A C How about this one? Indeed, it captures the following in- dependencies relation A ⊥ C|{B, D}

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 28

8/29

D B A C How about this one? Indeed, it captures the following in- dependencies relation A ⊥ C|{B, D} But, it also implies that B ⊥ D|{A, C}

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 29

9/29

Let us try a different network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 30

9/29

D B C A Let us try a different network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 31

9/29

D B C A Let us try a different network Again A ⊥ C|{B, D}

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 32

9/29

D B C A Let us try a different network Again A ⊥ C|{B, D} But B ⊥ D(unconditional)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 33

9/29

D B C A Let us try a different network Again A ⊥ C|{B, D} But B ⊥ D(unconditional) You can try other networks

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 34

9/29

D B C A Let us try a different network Again A ⊥ C|{B, D} But B ⊥ D(unconditional) You can try other networks Turns out there is no Bayesian Net- work which can exactly capture inde- pendence relations that we are inter- ested in

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 35

9/29

D B C A Perfect Map: A graph G is a Per- fect Map for a distribution P if the in- dependance relations implied by the graph are exactly the same as those implied by the distribution Let us try a different network Again A ⊥ C|{B, D} But B ⊥ D(unconditional) You can try other networks Turns out there is no Bayesian Net- work which can exactly capture inde- pendence relations that we are inter- ested in

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 36

9/29

D B C A Perfect Map: A graph G is a Per- fect Map for a distribution P if the in- dependance relations implied by the graph are exactly the same as those implied by the distribution Let us try a different network Again A ⊥ C|{B, D} But B ⊥ D(unconditional) You can try other networks Turns out there is no Bayesian Net- work which can exactly capture inde- pendence relations that we are inter- ested in There is no Perfect Map for the dis- tribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 37

10/29

D B A C The problem is that a directed graph- ical model is not suitable for this ex- ample

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 38

10/29

D B A C The problem is that a directed graph- ical model is not suitable for this ex- ample A directed edge between two nodes implies some kind of direction in the interaction

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 39

10/29

D B A C The problem is that a directed graph- ical model is not suitable for this ex- ample A directed edge between two nodes implies some kind of direction in the interaction For example A → B could indicate that A influences B but not the other way round

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 40

10/29

D B A C The problem is that a directed graph- ical model is not suitable for this ex- ample A directed edge between two nodes implies some kind of direction in the interaction For example A → B could indicate that A influences B but not the other way round But in our example A&B are equal partners (they both contribute to the study discussion)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 41

10/29

D B A C The problem is that a directed graph- ical model is not suitable for this ex- ample A directed edge between two nodes implies some kind of direction in the interaction For example A → B could indicate that A influences B but not the other way round But in our example A&B are equal partners (they both contribute to the study discussion) We want to capture the strength of this interaction (and there is no dir- ection here)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 42

11/29

D B A C We move on from Directed Graph- ical Models to Undirected Graphical Models

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 43

11/29

D B A C We move on from Directed Graph- ical Models to Undirected Graphical Models Also known as Markov Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 44

11/29

D B A C We move on from Directed Graph- ical Models to Undirected Graphical Models Also known as Markov Network The Markov Network on the left ex- actly captures the interactions inher- ent in the problem

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 45

11/29

D B A C We move on from Directed Graph- ical Models to Undirected Graphical Models Also known as Markov Network The Markov Network on the left ex- actly captures the interactions inher- ent in the problem But how do we parameterize this graph?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 46

12/29

Module 18.2: Factors in Markov Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 47

13/29

Grade SAT Intellligence Letter Difficulty

P(G,S, I, L, D) = P(I)P(D)P(G|I, D)P(S|I)P(L|G) Recall that in the directed case the factors were Conditional Probability Distributions (CPDs)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 48

13/29

Grade SAT Intellligence Letter Difficulty

P(G,S, I, L, D) = P(I)P(D)P(G|I, D)P(S|I)P(L|G) Recall that in the directed case the factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction (dependence) between the connected nodes

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 49

13/29

Grade SAT Intellligence Letter Difficulty

P(G,S, I, L, D) = P(I)P(D)P(G|I, D)P(S|I)P(L|G) Recall that in the directed case the factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction (dependence) between the connected nodes Can we use CPDs in the undirected case also ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 50

13/29

Grade SAT Intellligence Letter Difficulty

P(G,S, I, L, D) = P(I)P(D)P(G|I, D)P(S|I)P(L|G) Recall that in the directed case the factors were Conditional Probability Distributions (CPDs) Each such factor captured interaction (dependence) between the connected nodes Can we use CPDs in the undirected case also ? CPDs don’t make sense in the undir- ected case because there is no direc- tion and hence no natural condition- ing (Is A|B or B|A?)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 51

14/29

D B A C So what should be the factors or para- meters in this case

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 52

14/29

D B A C So what should be the factors or para- meters in this case Question: What do we want these factors to capture ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 53

14/29

D B A C So what should be the factors or para- meters in this case Question: What do we want these factors to capture ? Answer: The affinity between con- nected random variables

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 54

14/29

D B A C So what should be the factors or para- meters in this case Question: What do we want these factors to capture ? Answer: The affinity between con- nected random variables Just as in the directed case the factors captured the conditional dependence between a set of random variables, here we want them to capture the af- finity between them

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 55

15/29

D B A C However we can borrow the intuition from the directed case.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 56

15/29

D B A C However we can borrow the intuition from the directed case. Even in the undirected case, we want each such factor to capture inter- actions (affinity) between connected nodes

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 57

15/29

D B A C However we can borrow the intuition from the directed case. Even in the undirected case, we want each such factor to capture inter- actions (affinity) between connected nodes We could have factors φ1(A, B), φ2(B, C), φ3(C, D), φ4(D, A) which capture the affinity between the cor- responding nodes.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 58

16/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 a0 b0 a0 b0 a0 b0 a0 b1 a0 b1 a0 b1 a0 b1 a1 b0 a1 b0 a1 b0 a1 b0 a1 b1 a1 b1 a1 b1 a1 b1

Intuitively, it makes sense to have these factors associated with each pair of connected random variables.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 59

16/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

Intuitively, it makes sense to have these factors associated with each pair of connected random variables. We could now assign some values of these factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 60

16/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ?

Intuitively, it makes sense to have these factors associated with each pair of connected random variables. We could now assign some values of these factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 61

16/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ? Well now you need to learn them from data (same as in the directed case)

Intuitively, it makes sense to have these factors associated with each pair of connected random variables. We could now assign some values of these factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 62

16/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interac- tions between A&B then you could learn these values(more on this later)

Intuitively, it makes sense to have these factors associated with each pair of connected random variables. We could now assign some values of these factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 63

16/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interac- tions between A&B then you could learn these values(more on this later)

Intuitively, it makes sense to have these factors associated with each pair of connected random variables. We could now assign some values of these factors Roughly speaking φ1(A, B) asserts that it is more likely for A and B to agree [∵ weights for a0b0, a1b1 > a0b1, a1b0]

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 64

16/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interac- tions between A&B then you could learn these values(more on this later)

Intuitively, it makes sense to have these factors associated with each pair of connected random variables. We could now assign some values of these factors Roughly speaking φ1(A, B) asserts that it is more likely for A and B to agree [∵ weights for a0b0, a1b1 > a0b1, a1b0] φ1(A, B) also assigns more weight to the case when both do not have a mis- conception as compared to the case when both have the misconception a0b0 > a1b1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 65

16/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 b1 10 a1 b1 100 a1 b1 1 a1 b1 100

But who will give us these values ? Well now you need to learn them from data (same as in the directed case) If you have access to a lot of past interac- tions between A&B then you could learn these values(more on this later)

Intuitively, it makes sense to have these factors associated with each pair of connected random variables. We could now assign some values of these factors Roughly speaking φ1(A, B) asserts that it is more likely for A and B to agree [∵ weights for a0b0, a1b1 > a0b1, a1b0] φ1(A, B) also assigns more weight to the case when both do not have a mis- conception as compared to the case when both have the misconception a0b0 > a1b1 We could have similar assignments for the other factors

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 66

17/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

Notice a few things

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 67

17/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

Notice a few things These tables do not represent prob- ability distributions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 68

17/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

Notice a few things These tables do not represent prob- ability distributions They are just weights which can be interpreted as the relative likelihood

f an event

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 69

17/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

Notice a few things These tables do not represent prob- ability distributions They are just weights which can be interpreted as the relative likelihood

f an event

For example, a = 0, b = 0 is more likely than a = 1, b = 1

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 70

18/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

But eventually we are interested in probability distributions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 71

18/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

But eventually we are interested in probability distributions In the directed case going from factors to a joint probability dis- tribution was easy as the factors were themselves conditional probab- ility distributions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 72

18/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

But eventually we are interested in probability distributions In the directed case going from factors to a joint probability dis- tribution was easy as the factors were themselves conditional probab- ility distributions We could just write the joint probab- ility distribution as the product of the factors (without violating the axioms

f probability)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 73

18/29

D B A C

φ1(A, B) φ2(B, C) φ3(C, D) φ4(D, A) a0 b0 30 a0 b0 100 a0 b0 1 a0 b0 100 a0 b1 5 a0 b1 1 a0 b0 100 a0 b1 1 a1 b0 1 a1 b0 1 a1 b1 100 a1 b0 1 a1 a1 10 a1 b1 100 a1 b1 1 a1 b1 100

But eventually we are interested in probability distributions In the directed case going from factors to a joint probability dis- tribution was easy as the factors were themselves conditional probab- ility distributions We could just write the joint probab- ility distribution as the product of the factors (without violating the axioms

f probability)

What do we do in this case when the factors are not probability distribu- tions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 74

19/29

Assignment Unnormalized Normalized a0 b0 c0 d0 300,000 4.17E-02 a0 b0 c0 d1 300,000 4.17E-02 a0 b0 c1 d0 300,000 4.17E-02 a0 b0 c1 d1 30 4.17E-06 a0 b1 c0 d0 500 6.94E-05 a0 b1 c0 d1 500 6.94E-05 a0 b1 c1 d0 5,000,000 6.94E-01 a0 b1 c1 d1 500 6.94E-05 a1 b0 c0 d0 100 1.39E-05 a1 b0 c0 d1 1,000,000 1.39E-01 a1 b0 c1 d0 100 1.39E-05 a1 b0 c1 d1 100 1.39E-05 a1 b1 c0 d0 10 1.39E-06 a1 b1 c0 d1 100,000 1.39E-02 a1 b1 c1 d0 100,000 1.39E-02 a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a product

f these factors and normalize it appro-

priately

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 75

19/29

Assignment Unnormalized Normalized a0 b0 c0 d0 300,000 4.17E-02 a0 b0 c0 d1 300,000 4.17E-02 a0 b0 c1 d0 300,000 4.17E-02 a0 b0 c1 d1 30 4.17E-06 a0 b1 c0 d0 500 6.94E-05 a0 b1 c0 d1 500 6.94E-05 a0 b1 c1 d0 5,000,000 6.94E-01 a0 b1 c1 d1 500 6.94E-05 a1 b0 c0 d0 100 1.39E-05 a1 b0 c0 d1 1,000,000 1.39E-01 a1 b0 c1 d0 100 1.39E-05 a1 b0 c1 d1 100 1.39E-05 a1 b1 c0 d0 10 1.39E-06 a1 b1 c0 d1 100,000 1.39E-02 a1 b1 c1 d0 100,000 1.39E-02 a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a product

f these factors and normalize it appro-

priately P(a, b, c, d) = 1 Z φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 76

19/29

Assignment Unnormalized Normalized a0 b0 c0 d0 300,000 4.17E-02 a0 b0 c0 d1 300,000 4.17E-02 a0 b0 c1 d0 300,000 4.17E-02 a0 b0 c1 d1 30 4.17E-06 a0 b1 c0 d0 500 6.94E-05 a0 b1 c0 d1 500 6.94E-05 a0 b1 c1 d0 5,000,000 6.94E-01 a0 b1 c1 d1 500 6.94E-05 a1 b0 c0 d0 100 1.39E-05 a1 b0 c0 d1 1,000,000 1.39E-01 a1 b0 c1 d0 100 1.39E-05 a1 b0 c1 d1 100 1.39E-05 a1 b1 c0 d0 10 1.39E-06 a1 b1 c0 d1 100,000 1.39E-02 a1 b1 c1 d0 100,000 1.39E-02 a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a product

f these factors and normalize it appro-

priately P(a, b, c, d) = 1 Z φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a) where Z =

a,b,c,d

φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 77

19/29

Assignment Unnormalized Normalized a0 b0 c0 d0 300,000 4.17E-02 a0 b0 c0 d1 300,000 4.17E-02 a0 b0 c1 d0 300,000 4.17E-02 a0 b0 c1 d1 30 4.17E-06 a0 b1 c0 d0 500 6.94E-05 a0 b1 c0 d1 500 6.94E-05 a0 b1 c1 d0 5,000,000 6.94E-01 a0 b1 c1 d1 500 6.94E-05 a1 b0 c0 d0 100 1.39E-05 a1 b0 c0 d1 1,000,000 1.39E-01 a1 b0 c1 d0 100 1.39E-05 a1 b0 c1 d1 100 1.39E-05 a1 b1 c0 d0 10 1.39E-06 a1 b1 c0 d1 100,000 1.39E-02 a1 b1 c1 d0 100,000 1.39E-02 a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a product

f these factors and normalize it appro-

priately P(a, b, c, d) = 1 Z φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a) where Z =

a,b,c,d

φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a) Based on the values that we had assigned to the factors we can now compute the full joint probability distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 78

19/29

Assignment Unnormalized Normalized a0 b0 c0 d0 300,000 4.17E-02 a0 b0 c0 d1 300,000 4.17E-02 a0 b0 c1 d0 300,000 4.17E-02 a0 b0 c1 d1 30 4.17E-06 a0 b1 c0 d0 500 6.94E-05 a0 b1 c0 d1 500 6.94E-05 a0 b1 c1 d0 5,000,000 6.94E-01 a0 b1 c1 d1 500 6.94E-05 a1 b0 c0 d0 100 1.39E-05 a1 b0 c0 d1 1,000,000 1.39E-01 a1 b0 c1 d0 100 1.39E-05 a1 b0 c1 d1 100 1.39E-05 a1 b1 c0 d0 10 1.39E-06 a1 b1 c0 d1 100,000 1.39E-02 a1 b1 c1 d0 100,000 1.39E-02 a1 b1 c1 d1 100,000 1.39E-02

Well we could still write it as a product

f these factors and normalize it appro-

priately P(a, b, c, d) = 1 Z φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a) where Z =

a,b,c,d

φ1(a, b)φ2(b, c)φ3(c, d)φ4(d, a) Based on the values that we had assigned to the factors we can now compute the full joint probability distribution Z is called the partition function.

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 79

20/29

Let us build on the original example by adding some more students

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 80

20/29

D B A C E F

Let us build on the original example by adding some more students

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 81

20/29

D B A C E F

Let us build on the original example by adding some more students Once again there is an edge between two students if they study together

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 82

20/29

D B A C E F

Let us build on the original example by adding some more students Once again there is an edge between two students if they study together One way of interpreting these new connections is that {A, D, E} from a study group or a clique

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 83

20/29

D B A C E F

Let us build on the original example by adding some more students Once again there is an edge between two students if they study together One way of interpreting these new connections is that {A, D, E} from a study group or a clique Similarly {A, F, B} form a study group and {C, D} form a study group and {B, C} form a study group

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 84

21/29

D B A C E F

Now, what should the factors be?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 85

21/29

D B A C E F

Now, what should the factors be? We could still have factors which cap- ture pairwise interactions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 86

21/29

D B A C E F

φ1(A, E)φ2(A, F)φ3(B, F)φ4(A, B) φ5(A, D)φ6(D, E)φ7(B, C)φ8(C, D) Now, what should the factors be? We could still have factors which cap- ture pairwise interactions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 87

21/29

D B A C E F

φ1(A, E)φ2(A, F)φ3(B, F)φ4(A, B) φ5(A, D)φ6(D, E)φ7(B, C)φ8(C, D) Now, what should the factors be? We could still have factors which cap- ture pairwise interactions But could we do something smarter (and more efficient)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 88

21/29

D B A C E F

φ1(A, E)φ2(A, F)φ3(B, F)φ4(A, B) φ5(A, D)φ6(D, E)φ7(B, C)φ8(C, D) Now, what should the factors be? We could still have factors which cap- ture pairwise interactions But could we do something smarter (and more efficient) Instead of having a factor for each pair of nodes why not have it for each maximal clique?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 89

21/29

D B A C E F

φ1(A, E)φ2(A, F)φ3(B, F)φ4(A, B) φ5(A, D)φ6(D, E)φ7(B, C)φ8(C, D) φ1(A, E, D)φ2(A, F, B)φ3(B, C)φ4(C, D) Now, what should the factors be? We could still have factors which cap- ture pairwise interactions But could we do something smarter (and more efficient) Instead of having a factor for each pair of nodes why not have it for each maximal clique?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 90

22/29

What if we add one more student?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 91

22/29

D B A C F E G

What if we add one more student?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 92

22/29

D B A C F E G

What if we add one more student? What will be the factors in this case?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 93

22/29

D B A C F E G

What if we add one more student? What will be the factors in this case? Remember, we are interested in max- imal cliques

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 94

22/29

D B A C F E G

What if we add one more student? What will be the factors in this case? Remember, we are interested in max- imal cliques So instead of having factors φ(EAG) φ(GAD) φ(EGD) we will have a single factor φ(AEGD) correspond- ing to the maximal clique

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 95

23/29

Grade SAT Intellligence Letter Difficulty

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 96

23/29

Grade SAT Intellligence Letter Difficulty

A distribution P factorizes over a Bayesian Network G if P can be expressed as P(X1, . . . , Xn) =

n

i=1

P(Xi|PaXi )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 97

23/29

Grade SAT Intellligence Letter Difficulty

A distribution P factorizes over a Bayesian Network G if P can be expressed as P(X1, . . . , Xn) =

n

i=1

P(Xi|PaXi )

B C A D E F Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 98

23/29

Grade SAT Intellligence Letter Difficulty

A distribution P factorizes over a Bayesian Network G if P can be expressed as P(X1, . . . , Xn) =

n

i=1

P(Xi|PaXi )

B C A D E F

A distribution factorizes over a Markov Network H if P can be expressed as P(X1, . . . , Xn) = 1 Z

m

i=1

φ(Di) where each Di is a complete sub-graph (maximal clique) in H

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 99

23/29

Grade SAT Intellligence Letter Difficulty

A distribution P factorizes over a Bayesian Network G if P can be expressed as P(X1, . . . , Xn) =

n

i=1

P(Xi|PaXi )

B C A D E F

A distribution factorizes over a Markov Network H if P can be expressed as P(X1, . . . , Xn) = 1 Z

m

i=1

φ(Di) where each Di is a complete sub-graph (maximal clique) in H A distribution is a Gibbs distribution parametrized by a set of factors Φ = {φ1(D1), . . . , φm(Dm)} if it is defined as P(X1, . . . , Xn) = 1 Z

m

i=1

φi(Di)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 100

24/29

Module 18.3: Local Independencies in a Markov Network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 101

25/29

Let U be the set of all random vari- ables in our joint distribution

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 102

25/29

Let U be the set of all random vari- ables in our joint distribution Let X, Y, Z be some distinct subsets

f U

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 103

25/29

Let U be the set of all random vari- ables in our joint distribution Let X, Y, Z be some distinct subsets

f U

A distribution P

ver these RVs

would imply X⊥Y |Z if and only if we can write P(X) = φ1(X, Z)φ2(Y, Z)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 104

25/29

Let U be the set of all random vari- ables in our joint distribution Let X, Y, Z be some distinct subsets

f U

A distribution P

ver these RVs

would imply X⊥Y |Z if and only if we can write P(X) = φ1(X, Z)φ2(Y, Z) Let us see this in the context of our

riginal example

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 105

26/29

D B A C In this example P(A, B, C, D) = 1 Z [φ1(A, B)φ2(B, C)φ3(C, D)φ4(D, A)]

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 106

26/29

D B A C In this example P(A, B, C, D) = 1 Z [φ1(A, B)φ2(B, C)φ3(C, D)φ4(D, A)] We can rewrite this as P(A, B, C, D) = 1 Z [φ1(A, B)φ2(B, C)]

φ5(B,{A,C})

[φ3(C, D)φ4(D, A)]

φ6(D,{A,C})

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 107

26/29

D B A C In this example P(A, B, C, D) = 1 Z [φ1(A, B)φ2(B, C)φ3(C, D)φ4(D, A)] We can rewrite this as P(A, B, C, D) = 1 Z [φ1(A, B)φ2(B, C)]

φ5(B,{A,C})

[φ3(C, D)φ4(D, A)]

φ6(D,{A,C})

We can say that B⊥D|{A, C} which is indeed true

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 108

27/29

D B A C In this example P(A, B, C, D) = 1 Z [φ1(A, B)φ2(B, C)φ3(C, D)φ4(D, A)]

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 109

27/29

D B A C In this example P(A, B, C, D) = 1 Z [φ1(A, B)φ2(B, C)φ3(C, D)φ4(D, A)] Alternatively we can rewrite this as P(A, B, C, D) = 1 Z [φ1(A, B)φ2(D, A)]

φ5(A,{B,D})

[φ3(C, D)φ4(B, C)]

φ6(C,{B,D})

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 110

27/29

D B A C In this example P(A, B, C, D) = 1 Z [φ1(A, B)φ2(B, C)φ3(C, D)φ4(D, A)] Alternatively we can rewrite this as P(A, B, C, D) = 1 Z [φ1(A, B)φ2(D, A)]

φ5(A,{B,D})

[φ3(C, D)φ4(B, C)]

φ6(C,{B,D})

We can say that A⊥C|{B, D} which is indeed true

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 111

28/29

For a given Markov network H we define Markov Blanket of a RV X to be the neighbors of X in H

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 112

28/29

For a given Markov network H we define Markov Blanket of a RV X to be the neighbors of X in H Analogous to the case of Bayesian Networks we can define the local in- dependences associated with H to be X⊥(U − {X} − MBH)|MBH(X)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 113

29/29

Bayesian network

Grade SAT Intellligence Letter Difficulty

Local Independencies Xi⊥NonDescendentsXi|ParentG

Xi

Markov network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18

SLIDE 114

29/29

Bayesian network

Grade SAT Intellligence Letter Difficulty

Local Independencies Xi⊥NonDescendentsXi|ParentG

Xi

Markov network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 18