CSci 8980: Advanced Topics in Graphical Models Variational Inference - - PowerPoint PPT Presentation

csci 8980 advanced topics in graphical models variational
SMART_READER_LITE
LIVE PREVIEW

CSci 8980: Advanced Topics in Graphical Models Variational Inference - - PowerPoint PPT Presentation

Graphical Models Exponential Families Variational Methods Mean Field Approximation CSci 8980: Advanced Topics in Graphical Models Variational Inference Instructor: Arindam Banerjee October 17, 2007 Graphical Models Exponential Families


slide-1
SLIDE 1

Graphical Models Exponential Families Variational Methods Mean Field Approximation

CSci 8980: Advanced Topics in Graphical Models Variational Inference

Instructor: Arindam Banerjee October 17, 2007

slide-2
SLIDE 2

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Directed Graphical Models

Graph G = (V , E)

slide-3
SLIDE 3

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Directed Graphical Models

Graph G = (V , E) Each vertex is a random variable

slide-4
SLIDE 4

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Directed Graphical Models

Graph G = (V , E) Each vertex is a random variable π(s) denote the set of all parents of s ∈ V

slide-5
SLIDE 5

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Directed Graphical Models

Graph G = (V , E) Each vertex is a random variable π(s) denote the set of all parents of s ∈ V The joint distribution p(x) =

  • s∈V

p(xs|xπ(s))

slide-6
SLIDE 6

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph

slide-7
SLIDE 7

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph Let ψC : X n → R+ be a function over clique C

slide-8
SLIDE 8

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph Let ψC : X n → R+ be a function over clique C The joint distribution p(x) = 1 Z

  • C

ψC(xC)

slide-9
SLIDE 9

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph Let ψC : X n → R+ be a function over clique C The joint distribution p(x) = 1 Z

  • C

ψC(xC) Z ensures the distribution is normalized

slide-10
SLIDE 10

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Undirected Graphical Models

Distribution factorizes over cliques of the graph Let ψC : X n → R+ be a function over clique C The joint distribution p(x) = 1 Z

  • C

ψC(xC) Z ensures the distribution is normalized Known as a Markov random field

slide-11
SLIDE 11

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Basics (Review)

For any h : X n → R+, define measure ν as dν = h(x)dx

slide-12
SLIDE 12

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Basics (Review)

For any h : X n → R+, define measure ν as dν = h(x)dx Let t = {φα|α ∈ I} be a set of sufficient statistics

slide-13
SLIDE 13

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Basics (Review)

For any h : X n → R+, define measure ν as dν = h(x)dx Let t = {φα|α ∈ I} be a set of sufficient statistics Let θ = {θα|α ∈ I} be the natural parameters

slide-14
SLIDE 14

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Basics (Review)

For any h : X n → R+, define measure ν as dν = h(x)dx Let t = {φα|α ∈ I} be a set of sufficient statistics Let θ = {θα|α ∈ I} be the natural parameters The family of density functions w.r.t. dν p(x; θ) = exp(θ, t(x) − ψ(θ)) where ψ(θ) = log

  • x

exp(θ, t(x))ν(dx)

slide-15
SLIDE 15

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions

slide-16
SLIDE 16

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent

slide-17
SLIDE 17

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

slide-18
SLIDE 18

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable

slide-19
SLIDE 19

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable Components xs, xt interact only if there is an edge

slide-20
SLIDE 20

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable Components xs, xt interact only if there is an edge The joint distribution p(x; θ) = exp  

s∈V

θsxs +

  • (s,t)∈E

θstxsxt − ψ(θ)  

slide-21
SLIDE 21

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable Components xs, xt interact only if there is an edge The joint distribution p(x; θ) = exp  

s∈V

θsxs +

  • (s,t)∈E

θstxsxt − ψ(θ)   Dimensionality of the model is d = n + |E|

slide-22
SLIDE 22

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families

Graphical models are described as products of functions Products are additive in the exponent Ising Model:

Each vertex is a Bernoulli random variable Components xs, xt interact only if there is an edge The joint distribution p(x; θ) = exp  

s∈V

θsxs +

  • (s,t)∈E

θstxsxt − ψ(θ)   Dimensionality of the model is d = n + |E| It is a regular exponential family, with Θ = Rd

slide-23
SLIDE 23

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families (Contd.)

Latent Dirichlet Allocation: For a single document p(θ, z, w|α, β) = p(θ|α)

N

  • n=1

p(zn|θ)p(wn|zn, β) ∝ exp  

k

  • i=1

(αi − 1) log θi +

N

  • n=1

k

  • i=1

Ii(zn) log θi +

N

  • n=1

k

  • i=1

V

  • j=1

Ii[zn]I

slide-24
SLIDE 24

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Graphical Models as Exponential Families (Contd.)

Latent Dirichlet Allocation: For a single document p(θ, z, w|α, β) = p(θ|α)

N

  • n=1

p(zn|θ)p(wn|zn, β) ∝ exp  

k

  • i=1

(αi − 1) log θi +

N

  • n=1

k

  • i=1

Ii(zn) log θi +

N

  • n=1

k

  • i=1

V

  • j=1

Ii[zn]I The sufficient statistics consists of: {log θi, [i]k

1}

{Ii[zn] log θi, [i]k

1, [n]N 1 }

{Ii[zn]Ij[wn], [i]k

1, [n]N 1 , [j]V 1 }

slide-25
SLIDE 25

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ

ψ is the cumulant or log-partition function

slide-26
SLIDE 26

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ

ψ is the cumulant or log-partition function ψ(θ) is C ∞ on Θ

slide-27
SLIDE 27

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ

ψ is the cumulant or log-partition function ψ(θ) is C ∞ on Θ Its derivatives gives the moments of θ ∂ψ(θ) ∂θα = Eθ[tα(x)] ∂2ψ(θ) ∂θα∂θ(β) = Eθ[tα(x)tβ(x)] − Eθ[tα(x)]Eθ[tβ(x)]

slide-28
SLIDE 28

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ

ψ is the cumulant or log-partition function ψ(θ) is C ∞ on Θ Its derivatives gives the moments of θ ∂ψ(θ) ∂θα = Eθ[tα(x)] ∂2ψ(θ) ∂θα∂θ(β) = Eθ[tα(x)tβ(x)] − Eθ[tα(x)]Eθ[tβ(x)] ψ is a convex function, strictly convex if t(x) is minimal

slide-29
SLIDE 29

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ (Contd.)

The set of mean parameters M =

  • µ ∈ Rd|∃p(.)s.t.
  • t(x)p(x)ν(dx) = µ
slide-30
SLIDE 30

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ (Contd.)

The set of mean parameters M =

  • µ ∈ Rd|∃p(.)s.t.
  • t(x)p(x)ν(dx) = µ
  • Consider the mapping Λ : Θ → M as

Λ(θ) = Eθ[t(x)] =

  • x

t(x)p(x; θ)ν(dx)

slide-31
SLIDE 31

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ (Contd.)

The set of mean parameters M =

  • µ ∈ Rd|∃p(.)s.t.
  • t(x)p(x)ν(dx) = µ
  • Consider the mapping Λ : Θ → M as

Λ(θ) = Eθ[t(x)] =

  • x

t(x)p(x; θ)ν(dx) If t is minimal, Λ is one-to-one

slide-32
SLIDE 32

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Properties of the Cumulant ψ (Contd.)

The set of mean parameters M =

  • µ ∈ Rd|∃p(.)s.t.
  • t(x)p(x)ν(dx) = µ
  • Consider the mapping Λ : Θ → M as

Λ(θ) = Eθ[t(x)] =

  • x

t(x)p(x; θ)ν(dx) If t is minimal, Λ is one-to-one Further, Λ is onto the (relative) interior of M

slide-33
SLIDE 33

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Fenchel-Legendre Conjugacy

The conjugate dual function ψ∗(µ) = sup

θ∈Θ

{µ, θ − ψ(θ)}

slide-34
SLIDE 34

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Fenchel-Legendre Conjugacy

The conjugate dual function ψ∗(µ) = sup

θ∈Θ

{µ, θ − ψ(θ)} The (Bolzmann-Shannon) entropy of p(x; θ) w.r.t. ν is H(p(x; θ)) = −

  • x

p(x; θ) log p(x; θ)ν(dx) = −Eθ[log p(x; θ)]

slide-35
SLIDE 35

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Fenchel-Legendre Conjugacy

The conjugate dual function ψ∗(µ) = sup

θ∈Θ

{µ, θ − ψ(θ)} The (Bolzmann-Shannon) entropy of p(x; θ) w.r.t. ν is H(p(x; θ)) = −

  • x

p(x; θ) log p(x; θ)ν(dx) = −Eθ[log p(x; θ)] If µ ∈ ri M, then ψ∗(µ) = −H(p(x; θ(µ)))

slide-36
SLIDE 36

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Fenchel-Legendre Conjugacy

The conjugate dual function ψ∗(µ) = sup

θ∈Θ

{µ, θ − ψ(θ)} The (Bolzmann-Shannon) entropy of p(x; θ) w.r.t. ν is H(p(x; θ)) = −

  • x

p(x; θ) log p(x; θ)ν(dx) = −Eθ[log p(x; θ)] If µ ∈ ri M, then ψ∗(µ) = −H(p(x; θ(µ))) In terms of the dual, ψ has a variational representation ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)}

slide-37
SLIDE 37

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

slide-38
SLIDE 38

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ)

slide-39
SLIDE 39

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

slide-40
SLIDE 40

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)}

slide-41
SLIDE 41

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)} For all θ ∈ Θ, the supremum is attained by µ ∈ ri M µ = Eθ[t(x)] =

  • x

t(x)p(x; θ)ν(dx)

slide-42
SLIDE 42

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)} For all θ ∈ Θ, the supremum is attained by µ ∈ ri M µ = Eθ[t(x)] =

  • x

t(x)p(x; θ)ν(dx) Two primary challenges

slide-43
SLIDE 43

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)} For all θ ∈ Θ, the supremum is attained by µ ∈ ri M µ = Eθ[t(x)] =

  • x

t(x)p(x; θ)ν(dx) Two primary challenges

Set M is difficult to characterize

slide-44
SLIDE 44

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Main Issues

Key problems:

Computation of the cumulant function ψ(θ) Computation of the mean parameter µ = Eθ[t(x)]

The key equation for both problems ψ(θ) = sup

µ∈M

{θ, µ − ψ∗(µ)} For all θ ∈ Θ, the supremum is attained by µ ∈ ri M µ = Eθ[t(x)] =

  • x

t(x)p(x; θ)ν(dx) Two primary challenges

Set M is difficult to characterize Function ψ∗ lacks an explicit definition

slide-45
SLIDE 45

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

slide-46
SLIDE 46

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal

slide-47
SLIDE 47

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

slide-48
SLIDE 48

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

slide-49
SLIDE 49

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

The set M is a polytope M = {µ ∈ Rd|aj, µ ≤ bj, ∀j ∈ J }

slide-50
SLIDE 50

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

The set M is a polytope M = {µ ∈ Rd|aj, µ ≤ bj, ∀j ∈ J } Index set J is finite, but can be large

slide-51
SLIDE 51

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

The set M is a polytope M = {µ ∈ Rd|aj, µ ≤ bj, ∀j ∈ J } Index set J is finite, but can be large

Facets of the polytope can grow very fast with n

slide-52
SLIDE 52

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters

M has the following properties

M is full-dimensional if t is minimal M is bounded iff Θ = Rd and ψ is Lipschitz

Example: Mutinomial random vector x ∈ X n

The set M is a polytope M = {µ ∈ Rd|aj, µ ≤ bj, ∀j ∈ J } Index set J is finite, but can be large

Facets of the polytope can grow very fast with n A complete graph with n = 7 has more than 2 × 108 facets

slide-53
SLIDE 53

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Mean Parameters (Contd.)

slide-54
SLIDE 54

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy

slide-55
SLIDE 55

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy Typically, does not have an explicit closed form

slide-56
SLIDE 56

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions

slide-57
SLIDE 57

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions

Compute an inverse image θ(µ) using Λ−1(µ)

slide-58
SLIDE 58

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Dual Function

ψ∗ is the negative entropy Typically, does not have an explicit closed form In general, can be specified as a composition of two functions

Compute an inverse image θ(µ) using Λ−1(µ) Compute the negative entropy of p(x; θ(µ))

slide-59
SLIDE 59

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)}

slide-60
SLIDE 60

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)} Mean field focuses on tractable distributions

slide-61
SLIDE 61

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)} Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible

slide-62
SLIDE 62

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)} Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible I(H) be the indices of cliques in H

slide-63
SLIDE 63

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families

Based on the key equation ψ(θ) = sup

µ∈M

{µ, θ − ψ∗(µ)} Mean field focuses on tractable distributions Let H ⊆ G on which exact calculations are feasible I(H) be the indices of cliques in H Natural parameters for distributions corresponding to H E(H) = {θ ∈ Θ|θα = 0, ∀α ∈ I \ I(H)}

slide-64
SLIDE 64

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅)

slide-65
SLIDE 65

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E}

slide-66
SLIDE 66

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

slide-67
SLIDE 67

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

Structured approximation using spanning tree T = (V , E(T))

slide-68
SLIDE 68

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

Structured approximation using spanning tree T = (V , E(T)) Natural parameters belong to the subspace E(T) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E(T)}

slide-69
SLIDE 69

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

Structured approximation using spanning tree T = (V , E(T)) Natural parameters belong to the subspace E(T) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E(T)} For a subgraph H, the set of realizable mean parameters Mtract(G; H) = {µ ∈ Rd|µ = Eθ[t(x)], θ ∈ E(H)}

slide-70
SLIDE 70

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Tractable Families (Contd.)

Simple tractable subgraph is H = (V , ∅) Natural parameters belong to the subspace E(H) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E} Corresponding distribution p(x; θ) =

s∈V p(xs; θs)

Structured approximation using spanning tree T = (V , E(T)) Natural parameters belong to the subspace E(T) = {θ ∈ Θ|θst = 0, ∀(s, t) ∈ E(T)} For a subgraph H, the set of realizable mean parameters Mtract(G; H) = {µ ∈ Rd|µ = Eθ[t(x)], θ ∈ E(H)} The inclusion Mtract(G; H) ⊆ M(G) always holds

slide-71
SLIDE 71

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Lower Bounds

For any µ ∈ ri M, ψ(θ) ≥ θ, µ − ψ∗(µ)

slide-72
SLIDE 72

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Lower Bounds

For any µ ∈ ri M, ψ(θ) ≥ θ, µ − ψ∗(µ) Alternative proof using Jensen’s inequality ψ(θ) = log

  • x

p(x; θ)exp(θ, t(x)) p(x; θ) ν(dx) ≥

  • x

p(x; θ) [θ, t(x) − log p(x; θ(µ))] ν(dx) = θ, µ − ψ∗(µ)

slide-73
SLIDE 73

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Lower Bounds

For any µ ∈ ri M, ψ(θ) ≥ θ, µ − ψ∗(µ) Alternative proof using Jensen’s inequality ψ(θ) = log

  • x

p(x; θ)exp(θ, t(x)) p(x; θ) ν(dx) ≥

  • x

p(x; θ) [θ, t(x) − log p(x; θ(µ))] ν(dx) = θ, µ − ψ∗(µ) In general, ψ∗ does not have closed form

slide-74
SLIDE 74

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Lower Bounds

For any µ ∈ ri M, ψ(θ) ≥ θ, µ − ψ∗(µ) Alternative proof using Jensen’s inequality ψ(θ) = log

  • x

p(x; θ)exp(θ, t(x)) p(x; θ) ν(dx) ≥

  • x

p(x; θ) [θ, t(x) − log p(x; θ(µ))] ν(dx) = θ, µ − ψ∗(µ) In general, ψ∗ does not have closed form Since ψ∗

H has an explicit form, solve approximation

sup

µ∈Mtract

{µ, θ − ψ∗

H(µ)}

slide-75
SLIDE 75

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

  • riginal distribution
slide-76
SLIDE 76

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

  • riginal distribution

We will study Ising model as an example

slide-77
SLIDE 77

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

  • riginal distribution

We will study Ising model as an example Approximate G by fully disconnected graph H0 with no edges

slide-78
SLIDE 78

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

  • riginal distribution

We will study Ising model as an example Approximate G by fully disconnected graph H0 with no edges Then, the mean parameter set Mtract = {(µs, µst)|0 ≤ µs ≤ 1, µst = µsµt}

slide-79
SLIDE 79

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field

Chooses a fully factorized distribution to approximate the

  • riginal distribution

We will study Ising model as an example Approximate G by fully disconnected graph H0 with no edges Then, the mean parameter set Mtract = {(µs, µst)|0 ≤ µs ≤ 1, µst = µsµt} The negative entropy of the product distribution is ψ∗

H0(µ) =

  • s∈V

[µs log µs + (1 − µs) log(1 − µs)]

slide-80
SLIDE 80

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field (Contd.)

The naive mean field problem takes the form max

µ∈Mtract {µ, θ − ψ∗ H0(µ)}

slide-81
SLIDE 81

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field (Contd.)

The naive mean field problem takes the form max

µ∈Mtract {µ, θ − ψ∗ H0(µ)}

Using µst = µsµt, we get the reduced problem max

{µs}∈[0,1]n

  

  • s∈V

θsµs +

  • (s,t)∈E

θstµsµt −

  • s∈V

[µs log µs + (1 − µs) log(1

slide-82
SLIDE 82

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field (Contd.)

The naive mean field problem takes the form max

µ∈Mtract {µ, θ − ψ∗ H0(µ)}

Using µst = µsµt, we get the reduced problem max

{µs}∈[0,1]n

  

  • s∈V

θsµs +

  • (s,t)∈E

θstµsµt −

  • s∈V

[µs log µs + (1 − µs) log(1 It is concave in µs with other co-ordinates held fixed

slide-83
SLIDE 83

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Naive Mean Field (Contd.)

The naive mean field problem takes the form max

µ∈Mtract {µ, θ − ψ∗ H0(µ)}

Using µst = µsµt, we get the reduced problem max

{µs}∈[0,1]n

  

  • s∈V

θsµs +

  • (s,t)∈E

θstµsµt −

  • s∈V

[µs log µs + (1 − µs) log(1 It is concave in µs with other co-ordinates held fixed Taking gradient and setting it to zero yields µs ← 1 1 + exp(−(θs +

t∈N(s) θstµt))

slide-84
SLIDE 84

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure

slide-85
SLIDE 85

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H

slide-86
SLIDE 86

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

slide-87
SLIDE 87

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H)

slide-88
SLIDE 88

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

slide-89
SLIDE 89

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

But such µβ do appear in the µ, β term

slide-90
SLIDE 90

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

But such µβ do appear in the µ, β term Each µβ = gβ(µ(H)), i.e., depends on µ(H) non-linearly

slide-91
SLIDE 91

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

But such µβ do appear in the µ, β term Each µβ = gβ(µ(H)), i.e., depends on µ(H) non-linearly The approximate optimization problem can be written as sup

µ(H)∈M(H)

  

  • α∈I(H)

θαµα +

  • α∈Ic(H)

θαgα(µ(H)) − ψ∗

H(µ(H))

  

slide-92
SLIDE 92

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field

Considers tractable distributions with additional structure For subgraph H, lets I(H) be the index set associated with H With µ(H) = {µα|α ∈ H}, we have

The subvector µ(H) can be an arbitrary member of M(H) Dual ψ∗

H depends only on µ(H), not on µβ, β ∈ I(G) \ I(H)

But such µβ do appear in the µ, β term Each µβ = gβ(µ(H)), i.e., depends on µ(H) non-linearly The approximate optimization problem can be written as sup

µ(H)∈M(H)

  

  • α∈I(H)

θαµα +

  • α∈Ic(H)

θαgα(µ(H)) − ψ∗

H(µ(H))

   For Ising model, with H0 = (V , ∅), gst(µ(H0)) = µsµt

slide-93
SLIDE 93

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function

slide-94
SLIDE 94

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ

slide-95
SLIDE 95

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping

slide-96
SLIDE 96

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping Setting the gradient to zero yields the update γβ(H) ← θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ

slide-97
SLIDE 97

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping Setting the gradient to zero yields the update γβ(H) ← θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ For Ising model, ∂gst

∂µs = µt and so on

slide-98
SLIDE 98

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping Setting the gradient to zero yields the update γβ(H) ← θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ For Ising model, ∂gst

∂µs = µt and so on

We get the exact updates as naive mean field

slide-99
SLIDE 99

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Structured Mean Field (Contd.)

Let F(µ(H)) denote the cost function Taking derivative w.r.t. µβ, β ∈ I(H) yields ∂F(µ(H)) ∂µβ = θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ − ∂ψ∗

H(µ(H))

∂µβ γβ(H) = ∂ψ∗

H(µ(H))

∂µβ

is the inverse moment mapping Setting the gradient to zero yields the update γβ(H) ← θβ +

  • α∈Ic(H)

θα ∂gα(µ(H)) ∂µβ For Ising model, ∂gst

∂µs = µt and so on

We get the exact updates as naive mean field In general, H can be more involved

slide-100
SLIDE 100

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

slide-101
SLIDE 101

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex

slide-102
SLIDE 102

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

slide-103
SLIDE 103

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

The (structured) mean field contains non-linear terms

slide-104
SLIDE 104

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

The (structured) mean field contains non-linear terms

  • α∈I(H) θαgα(µ) involves non-linear function gα
slide-105
SLIDE 105

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

The (structured) mean field contains non-linear terms

  • α∈I(H) θαgα(µ) involves non-linear function gα

For Ising model, gα is of the form µsµt

slide-106
SLIDE 106

Graphical Models Exponential Families Variational Methods Mean Field Approximation

Non-convexity of Mean Field

The original problem is concave

The constraint set M(H) is convex The objective contains entropy and linear terms in µα

The (structured) mean field contains non-linear terms

  • α∈I(H) θαgα(µ) involves non-linear function gα

For Ising model, gα is of the form µsµt A quadratic form need not be concave