[PPT] - CSci 8980: Advanced Topics in Graphical Models Variational Inference PowerPoint Presentation

SLIDE 1

Graphical Models Exponential Families Variational Methods Mean Field Approximation

CSci 8980: Advanced Topics in Graphical Models Variational Inference

Instructor: Arindam Banerjee October 17, 2007

SLIDE 2

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Directed Graphical Models

Graph G = (V , E)

SLIDE 3

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Directed Graphical Models

Graph G = (V , E) Each vertex is a random variable

SLIDE 4

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Directed Graphical Models

Graph G = (V , E) Each vertex is a random variable π(s) denote the set of all parents of s ∈ V

SLIDE 5

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Directed Graphical Models

Graph G = (V , E) Each vertex is a random variable π(s) denote the set of all parents of s ∈ V The joint distribution p(x) =

s∈V

p(xs|xπ(s))

SLIDE 6

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph

SLIDE 7

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph Let ψC : X n → R+ be a function over clique C

SLIDE 8

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph Let ψC : X n → R+ be a function over clique C The joint distribution p(x) = 1 Z

C

ψC(xC)

SLIDE 9

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph Let ψC : X n → R+ be a function over clique C The joint distribution p(x) = 1 Z

C

ψC(xC) Z ensures the distribution is normalized

SLIDE 10

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph Let ψC : X n → R+ be a function over clique C The joint distribution p(x) = 1 Z

C

ψC(xC) Z ensures the distribution is normalized Known as a Markov random field

SLIDE 11

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Basics (Review)

For any h : X n → R+, define measure ν as dν = h(x)dx

SLIDE 12

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Basics (Review)

For any h : X n → R+, define measure ν as dν = h(x)dx Let t = {φα|α ∈ I} be a set of sufficient statistics

SLIDE 13

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Basics (Review)

For any h : X n → R+, define measure ν as dν = h(x)dx Let t = {φα|α ∈ I} be a set of sufficient statistics Let θ = {θα|α ∈ I} be the natural parameters

SLIDE 14

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Basics (Review)

For any h : X n → R+, define measure ν as dν = h(x)dx Let t = {φα|α ∈ I} be a set of sufficient statistics Let θ = {θα|α ∈ I} be the natural parameters The family of density functions w.r.t. dν p(x; θ) = exp(θ, t(x) − ψ(θ)) where ψ(θ) = log

x

exp(θ, t(x))ν(dx)

SLIDE 15

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions

SLIDE 16

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent

SLIDE 17

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

SLIDE 18

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable

SLIDE 19

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable Components xs, xt interact only if there is an edge

SLIDE 20

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable Components xs, xt interact only if there is an edge The joint distribution p(x; θ) = exp  

s∈V

θsxs +

(s,t)∈E

θstxsxt − ψ(θ)  

SLIDE 21

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable Components xs, xt interact only if there is an edge The joint distribution p(x; θ) = exp  

s∈V

θsxs +

(s,t)∈E

θstxsxt − ψ(θ)   Dimensionality of the model is d = n + |E|

SLIDE 22

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable Components xs, xt interact only if there is an edge The joint distribution p(x; θ) = exp  

s∈V

θsxs +

(s,t)∈E

θstxsxt − ψ(θ)   Dimensionality of the model is d = n + |E| It is a regular exponential family, with Θ = Rd

SLIDE 23

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families (Contd.)

Latent Dirichlet Allocation: For a single document p(θ, z, w|α, β) = p(θ|α)

N

n=1

p(zn|θ)p(wn|zn, β) ∝ exp  

k

i=1

(αi − 1) log θi +

N

n=1

k

i=1

Ii(zn) log θi +

N

n=1

k

i=1

V

j=1

Ii[zn]I

SLIDE 24

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families (Contd.)

Latent Dirichlet Allocation: For a single document p(θ, z, w|α, β) = p(θ|α)

N

n=1

p(zn|θ)p(wn|zn, β) ∝ exp  

k

i=1

(αi − 1) log θi +

N

n=1

k

i=1

Ii(zn) log θi +

N

n=1

k

i=1

V

j=1

Ii[zn]I The sufficient statistics consists of: {log θi, [i]k

1}

{Ii[zn] log θi, [i]k

1, [n]N 1 }

{Ii[zn]Ij[wn], [i]k

1, [n]N 1 , [j]V 1 }

SLIDE 25

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ

ψ is the cumulant or log-partition function

SLIDE 26

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ

ψ is the cumulant or log-partition function ψ(θ) is C ∞ on Θ

SLIDE 27

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ

ψ is the cumulant or log-partition function ψ(θ) is C ∞ on Θ Its derivatives gives the moments of θ ∂ψ(θ) ∂θα = Eθ[tα(x)] ∂2ψ(θ) ∂θα∂θ(β) = Eθ[tα(x)tβ(x)] − Eθ[tα(x)]Eθ[tβ(x)]

SLIDE 28

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ

ψ is the cumulant or log-partition function ψ(θ) is C ∞ on Θ Its derivatives gives the moments of θ ∂ψ(θ) ∂θα = Eθ[tα(x)] ∂2ψ(θ) ∂θα∂θ(β) = Eθ[tα(x)tβ(x)] − Eθ[tα(x)]Eθ[tβ(x)] ψ is a convex function, strictly convex if t(x) is minimal

SLIDE 29

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ (Contd.)

The set of mean parameters M =

µ ∈ Rd|∃p(.)s.t.
t(x)p(x)ν(dx) = µ

SLIDE 30

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ (Contd.)

The set of mean parameters M =

µ ∈ Rd|∃p(.)s.t.
t(x)p(x)ν(dx) = µ
Consider the mapping Λ : Θ → M as

Λ(θ) = Eθ[t(x)] =

x

t(x)p(x; θ)ν(dx)

SLIDE 31

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ (Contd.)

The set of mean parameters M =

µ ∈ Rd|∃p(.)s.t.
t(x)p(x)ν(dx) = µ
Consider the mapping Λ : Θ → M as

Λ(θ) = Eθ[t(x)] =

x

t(x)p(x; θ)ν(dx) If t is minimal, Λ is one-to-one

SLIDE 32

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ (Contd.)

The set of mean parameters M =

µ ∈ Rd|∃p(.)s.t.
t(x)p(x)ν(dx) = µ
Consider the mapping Λ : Θ → M as

Λ(θ) = Eθ[t(x)] =

x

t(x)p(x; θ)ν(dx) If t is minimal, Λ is one-to-one Further, Λ is onto the (relative) interior of M

SLIDE 33

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Fenchel-Legendre Conjugacy

The conjugate dual function ψ∗(µ) = sup

θ∈Θ

{µ, θ − ψ(θ)}

SLIDE 34

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Fenchel-Legendre Conjugacy

The conjugate dual function ψ∗(µ) = sup

θ∈Θ

{µ, θ − ψ(θ)} The (Bolzmann-Shannon) entropy of p(x; θ) w.r.t. ν is H(p(x; θ)) = −

x

p(x; θ) log p(x; θ)ν(dx) = −Eθ[log p(x; θ)]

SLIDE 35

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Fenchel-Legendre Conjugacy

The conjugate dual function ψ∗(µ) = sup

θ∈Θ

{µ, θ − ψ(θ)} The (Bolzmann-Shannon) entropy of p(x; θ) w.r.t. ν is H(p(x; θ)) = −

x

p(x; θ) log p(x; θ)ν(dx) = −Eθ[log p(x; θ)] If µ ∈ ri M, then ψ∗(µ) = −H(p(x; θ(µ)))

SLIDE 36

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Fenchel-Legendre Conjugacy

The conjugate dual function ψ∗(µ) = sup

θ∈Θ

{µ, θ − ψ(θ)} The (Bolzmann-Shannon) entropy of p(x; θ) w.r.t. ν is H(p(x; θ)) = −

x

p(x; θ) log p(x; θ)ν(dx) = −Eθ[log p(x; θ)] If µ ∈ ri M, then ψ∗(µ) = −H(p(x; θ(µ))) In terms of the dual, ψ has a variational representation ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)}

SLIDE 37

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

SLIDE 38

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ)

SLIDE 39

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

SLIDE 40

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)}

SLIDE 41

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)} For all θ ∈ Θ, the supremum is attained by µ ∈ ri M µ = Eθ[t(x)] =

x

t(x)p(x; θ)ν(dx)

SLIDE 42

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)} For all θ ∈ Θ, the supremum is attained by µ ∈ ri M µ = Eθ[t(x)] =

x

t(x)p(x; θ)ν(dx) Two primary challenges

SLIDE 43

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)} For all θ ∈ Θ, the supremum is attained by µ ∈ ri M µ = Eθ[t(x)] =

x

t(x)p(x; θ)ν(dx) Two primary challenges

Set M is difficult to characterize

SLIDE 44

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)} For all θ ∈ Θ, the supremum is attained by µ ∈ ri M µ = Eθ[t(x)] =

x

t(x)p(x; θ)ν(dx) Two primary challenges

Set M is difficult to characterize Function ψ∗ lacks an explicit definition

SLIDE 45

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

SLIDE 46

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal

SLIDE 47

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

SLIDE 48

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

SLIDE 49

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

The set M is a polytope M = {µ ∈ Rd|aj, µ ≤ bj, ∀j ∈ J }

SLIDE 50

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

The set M is a polytope M = {µ ∈ Rd|aj, µ ≤ bj, ∀j ∈ J } Index set J is finite, but can be large

SLIDE 51

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

The set M is a polytope M = {µ ∈ Rd|aj, µ ≤ bj, ∀j ∈ J } Index set J is finite, but can be large

Facets of the polytope can grow very fast with n

SLIDE 52

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

The set M is a polytope M = {µ ∈ Rd|aj, µ ≤ bj, ∀j ∈ J } Index set J is finite, but can be large

Facets of the polytope can grow very fast with n A complete graph with n = 7 has more than 2 × 108 facets

SLIDE 53

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters (Contd.)

SLIDE 54

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy

SLIDE 55

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy Typically, does not have an explicit closed form

SLIDE 56

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions

SLIDE 57

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions

Compute an inverse image θ(µ) using Λ−1(µ)

SLIDE 58

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions

Compute an inverse image θ(µ) using Λ−1(µ) Compute the negative entropy of p(x; θ(µ))

SLIDE 59

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)}

SLIDE 60

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)} Mean field focuses on tractable distributions

SLIDE 61

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)} Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible

SLIDE 62

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)} Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible I(H) be the indices of cliques in H

SLIDE 63

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)} Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible I(H) be the indices of cliques in H Natural parameters for distributions corresponding to H E(H) = {θ ∈ Θ|θα = 0, ∀α ∈ I \ I(H)}

SLIDE 64

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅)

SLIDE 65

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E}

SLIDE 66

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

SLIDE 67

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

Structured approximation using spanning tree T = (V , E(T))

SLIDE 68

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

Structured approximation using spanning tree T = (V , E(T)) Natural parameters belong to the subspace E(T) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E(T)}

SLIDE 69

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

Structured approximation using spanning tree T = (V , E(T)) Natural parameters belong to the subspace E(T) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E(T)} For a subgraph H, the set of realizable mean parameters Mtract(G; H) = {µ ∈ Rd|µ = Eθ[t(x)], θ ∈ E(H)}

SLIDE 70

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

Structured approximation using spanning tree T = (V , E(T)) Natural parameters belong to the subspace E(T) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E(T)} For a subgraph H, the set of realizable mean parameters Mtract(G; H) = {µ ∈ Rd|µ = Eθ[t(x)], θ ∈ E(H)} The inclusion Mtract(G; H) ⊆ M(G) always holds

SLIDE 71

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Lower Bounds

For any µ ∈ ri M, ψ(θ) ≥ θ, µ − ψ∗(µ)

SLIDE 72

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Lower Bounds

For any µ ∈ ri M, ψ(θ) ≥ θ, µ − ψ∗(µ) Alternative proof using Jensen’s inequality ψ(θ) = log

x

p(x; θ)exp(θ, t(x)) p(x; θ) ν(dx) ≥

x

p(x; θ) [θ, t(x) − log p(x; θ(µ))] ν(dx) = θ, µ − ψ∗(µ)

SLIDE 73

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Lower Bounds

For any µ ∈ ri M, ψ(θ) ≥ θ, µ − ψ∗(µ) Alternative proof using Jensen’s inequality ψ(θ) = log

x

p(x; θ)exp(θ, t(x)) p(x; θ) ν(dx) ≥

x

p(x; θ) [θ, t(x) − log p(x; θ(µ))] ν(dx) = θ, µ − ψ∗(µ) In general, ψ∗ does not have closed form

SLIDE 74

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Lower Bounds

For any µ ∈ ri M, ψ(θ) ≥ θ, µ − ψ∗(µ) Alternative proof using Jensen’s inequality ψ(θ) = log

x

p(x; θ)exp(θ, t(x)) p(x; θ) ν(dx) ≥

x

p(x; θ) [θ, t(x) − log p(x; θ(µ))] ν(dx) = θ, µ − ψ∗(µ) In general, ψ∗ does not have closed form Since ψ∗

H has an explicit form, solve approximation

sup

µ∈Mtract

{µ, θ − ψ∗

H(µ)}

SLIDE 75

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

riginal distribution

SLIDE 76

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

riginal distribution

We will study Ising model as an example

SLIDE 77

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

riginal distribution

We will study Ising model as an example Approximate G by fully disconnected graph H0 with no edges

SLIDE 78

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

riginal distribution

We will study Ising model as an example Approximate G by fully disconnected graph H0 with no edges Then, the mean parameter set Mtract = {(µs, µst)|0 ≤ µs ≤ 1, µst = µsµt}

SLIDE 79

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

riginal distribution

We will study Ising model as an example Approximate G by fully disconnected graph H0 with no edges Then, the mean parameter set Mtract = {(µs, µst)|0 ≤ µs ≤ 1, µst = µsµt} The negative entropy of the product distribution is ψ∗

H0(µ) =

s∈V

[µs log µs + (1 − µs) log(1 − µs)]

SLIDE 80

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field (Contd.)

The naive mean field problem takes the form max

µ∈Mtract {µ, θ − ψ∗ H0(µ)}

SLIDE 81

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field (Contd.)

The naive mean field problem takes the form max

µ∈Mtract {µ, θ − ψ∗ H0(µ)}

Using µst = µsµt, we get the reduced problem max

{µs}∈[0,1]n

  

s∈V

θsµs +

(s,t)∈E

θstµsµt −

s∈V

[µs log µs + (1 − µs) log(1

SLIDE 82

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field (Contd.)

The naive mean field problem takes the form max

µ∈Mtract {µ, θ − ψ∗ H0(µ)}

Using µst = µsµt, we get the reduced problem max

{µs}∈[0,1]n

  

s∈V

θsµs +

(s,t)∈E

θstµsµt −

s∈V

[µs log µs + (1 − µs) log(1 It is concave in µs with other co-ordinates held fixed

SLIDE 83

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field (Contd.)

The naive mean field problem takes the form max

µ∈Mtract {µ, θ − ψ∗ H0(µ)}

Using µst = µsµt, we get the reduced problem max

{µs}∈[0,1]n

  

s∈V

θsµs +

(s,t)∈E

θstµsµt −

s∈V

[µs log µs + (1 − µs) log(1 It is concave in µs with other co-ordinates held fixed Taking gradient and setting it to zero yields µs ← 1 1 + exp(−(θs +

t∈N(s) θstµt))

SLIDE 84

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure

SLIDE 85

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H

SLIDE 86

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

SLIDE 87

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H)

SLIDE 88

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

SLIDE 89

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

But such µβ do appear in the µ, β term

SLIDE 90

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

But such µβ do appear in the µ, β term Each µβ = gβ(µ(H)), i.e., depends on µ(H) non-linearly

SLIDE 91

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

But such µβ do appear in the µ, β term Each µβ = gβ(µ(H)), i.e., depends on µ(H) non-linearly The approximate optimization problem can be written as sup

µ(H)∈M(H)

  

α∈I(H)

θαµα +

α∈Ic(H)

θαgα(µ(H)) − ψ∗

H(µ(H))

  

SLIDE 92

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

But such µβ do appear in the µ, β term Each µβ = gβ(µ(H)), i.e., depends on µ(H) non-linearly The approximate optimization problem can be written as sup

µ(H)∈M(H)

  

α∈I(H)

θαµα +

α∈Ic(H)

θαgα(µ(H)) − ψ∗

H(µ(H))

   For Ising model, with H0 = (V , ∅), gst(µ(H0)) = µsµt

SLIDE 93

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function

SLIDE 94

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ

SLIDE 95

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping

SLIDE 96

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping Setting the gradient to zero yields the update γβ(H) ← θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ

SLIDE 97

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping Setting the gradient to zero yields the update γβ(H) ← θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ For Ising model, ∂gst

∂µs = µt and so on

SLIDE 98

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping Setting the gradient to zero yields the update γβ(H) ← θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ For Ising model, ∂gst

∂µs = µt and so on

We get the exact updates as naive mean field

SLIDE 99

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping Setting the gradient to zero yields the update γβ(H) ← θβ +

α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ For Ising model, ∂gst

∂µs = µt and so on

We get the exact updates as naive mean field In general, H can be more involved

SLIDE 100

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

SLIDE 101

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex

SLIDE 102

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

SLIDE 103

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

The (structured) mean field contains non-linear terms

SLIDE 104

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

The (structured) mean field contains non-linear terms

α∈I(H) θαgα(µ) involves non-linear function gα

SLIDE 105

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

The (structured) mean field contains non-linear terms

α∈I(H) θαgα(µ) involves non-linear function gα

For Ising model, gα is of the form µsµt

SLIDE 106

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

The (structured) mean field contains non-linear terms

α∈I(H) θαgα(µ) involves non-linear function gα