EM and GMM
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
EM and GMM Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / - - PowerPoint PPT Presentation
EM and GMM Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW 3 due March 27. Final project discussion: Link Final exam date/time Exam Section: 14M
Jia-Bin Huang Virginia Tech
Spring 2019
ECE-5424G / CS-5824
Department of Statistics and Electrical Engineering and Computer Sciences
Slide credit: Andrew Ng
Repeat{ for π = 1 to π π(π) β index (from 1 to πΏ) of cluster centroid closest to π¦(π) for π = 1 to πΏ ππ β average (mean) of points assigned to cluster π } Cluster assignment step Centroid update step
Slide credit: Andrew Ng
example π¦ π is currently assigned
= cluster centroid π (ππ β βπ)
example π¦ π has been assigned
πΎ π 1 , β― , π π , π1, β― , ππΏ = 1 π ΰ·
π=1 π
π¦ π β ππ π
2
min
π 1 ,β―,π π π1,β―,ππΏ
πΎ π 1 , β― , π π , π1, β― , ππΏ Example: π¦(π) = 5 π(π) = 5 ππ(π) = π5
Slide credit: Andrew Ng
Randomly initialize πΏ cluster centroids π1, π2, β― , ππΏ β βπ Repeat{ for π = 1 to π π(π) β index (from 1 to πΏ) of cluster centroid closest to π¦(π) for π = 1 to πΏ ππ β average (mean) of points assigned to cluster π }
Cluster assignment step
πΎ π 1 , β― , π π , π1, β― , ππΏ = 1 π ΰ·
π=1 π
π¦ π β ππ π
2
Centroid update step
πΎ π 1 , β― , π π , π1, β― , ππΏ = 1 π ΰ·
π=1 π
π¦ π β ππ π
2
Slide credit: Andrew Ng
prunings.
Slide credit: Maria-Florina Balcan
Slide credit: Maria-Florina Balcan
min
xβπ΅,π¦β²βπΆ d(x, xβ²)
max
xβπ΅,π¦β²βπΆ d(x, xβ²)
xβπ΅,π¦β²βπΆ
d(x, xβ²)
π΅ |πΆ| π΅ +|πΆ| mean π΅ β mean πΆ 2
Slide credit: Maria-Florina Balcan
min
xβπ΅,π¦β²βπΆ d(x, xβ²)
max
xβπ΅,π¦β²βπΆ d(x, xβ²)
π΅ |πΆ| π΅ +|πΆ| mean π΅ β mean πΆ 2
possible.
Slide credit: Maria-Florina Balcan
You want to train an algorithm to predict whether a photograph is
annotators try to give accurate ratings, but others answer randomly. Challenge: Determine which people to trust and the average rating by accurate annotators.
Photo: Jam343 (Flickr)
Annotator Ratings 10 8 9 2 8
You have a collection of images and have extracted regions from
Challenge: Discover frequently occurring object categories, without pre-trained appearance models.
http://www.robots.ox.ac.uk/~vgg/publications/papers/russell06.pdf
You are given an image and want to assign foreground/background pixels. Challenge: Segment the image into figure and ground without knowing what the foreground looks like in advance.
Foreground Background
Challenge: Segment the image into figure and ground without knowing what the foreground looks like in advance. Three steps: 1. If we had labels, how could we model the appearance of foreground and background?
2. Once we have modeled the fg/bg appearance, how do we compute the likelihood that a pixel is foreground?
3. How can we get both labels and appearance models at once?
foreground and background?
Foreground Background
ο» ο½
ο½ ο½ ο½
n n N
x p p x x ) | ( argmax Λ ) | ( argmax Λ ..
1
ο± ο± ο± ο±
ο± ο±
x x
data parameters
ο» ο½
ο½ ο½ ο½
n n N
x p p x x ) | ( argmax Λ ) | ( argmax Λ ..
1
ο± ο± ο± ο±
ο± ο±
x x
Gaussian Distribution
ο¨ ο©
ο· ο· οΈ οΆ ο§ ο§ ο¨ ο¦ ο ο ο½
2 2 2 2
2 exp 2 1 ) , | ( ο³ ο ο°ο³ ο³ ο
n n
x x p
ΰ· π = argmaxπ π π² π) = argmaxπ log π π² π) ΰ· π = argmaxπ ΰ·
π
log (π π¦π π ) = argmaxπ π(π) π π = βπ 2 log 2π β βπ 2 log π2 β 1 2π2 ΰ·
π
π¦π β π 2 ππ(π) ππ = 1 π2 ΰ·
π
π¦π β π£ = 0 β ΖΈ π = 1 π ΰ·
π
π¦π ππ(π) ππ = π π β 1 π3 ΰ·
π
π¦π β π 2 = 0 β π2 = 1 π ΰ·
π
π¦π β ΖΈ π 2
Log-Likelihood
ο¨ ο©
ο· ο· οΈ οΆ ο§ ο§ ο¨ ο¦ ο ο ο½
2 2 2 2
2 exp 2 1 ) , | ( ο³ ο ο°ο³ ο³ ο
n n
x x p
Gaussian Distribution
ο» ο½
ο½ ο½ ο½
n n N
x p p x x ) | ( argmax Λ ) | ( argmax Λ ..
1
ο± ο± ο± ο±
ο± ο±
x x
ο¨ ο©
ο· ο· οΈ οΆ ο§ ο§ ο¨ ο¦ ο ο ο½
2 2 2 2
2 exp 2 1 ) , | ( ο³ ο ο°ο³ ο³ ο
n n
x x p
Gaussian Distribution
ο½
n n
x N 1 Λ ο
ο¨ ο©
ο ο½
n n
x N
2 2
Λ 1 Λ ο ο³
>> mu_fg = mean(im(labels)) mu_fg = 0.6012 >> sigma_fg = sqrt(mean((im(labels)-mu_fg).^2)) sigma_fg = 0.1007 >> mu_bg = mean(im(~labels)) mu_bg = 0.4007 >> sigma_bg = sqrt(mean((im(~labels)-mu_bg).^2)) sigma_bg = 0.1007 >> pfg = mean(labels(:));
labels im fg: mu=0.6, sigma=0.1 bg: mu=0.4, sigma=0.1 Parameters used to Generate
we compute the likelihood that a pixel is foreground?
Foreground Background
Compute the likelihood that a particular model generated a sample
component or label
) , | ( ο±
n n
x m z p ο½
component or label
ο¨ ο© ο¨ ο©
ο± ο± ο± | | , ) , | (
n m n n n n
x p x m z p x m z p ο½ ο½ ο½
Conditional probability
π π΅ πΆ = π(π΅, πΆ) π(πΆ)
Compute the likelihood that a particular model generated a sample
component or label
ο¨ ο© ο¨ ο©
ο± ο± ο± | | , ) , | (
n m n n n n
x p x m z p x m z p ο½ ο½ ο½
ο¨ ο© ο¨ ο©
ο½ ο½ ο½
k k n n m n n
x k z p x m z p ο± ο± | , | , Compute the likelihood that a particular model generated a sample
Marginalization
π π΅ = ΰ·
π
π(π΅, πΆ = π)
component or label
ο¨ ο© ο¨ ο©
ο± ο± ο± | | , ) , | (
n m n n n n
x p x m z p x m z p ο½ ο½ ο½
ο¨ ο© ο¨ ο© ο¨ ο© ο¨ ο©
ο½ ο½ ο½ ο½ ο½
k k n k n n m n m n n
k z p k z x p m z p m z x p ο± ο± ο± ο± | , | | , |
ο¨ ο© ο¨ ο©
ο½ ο½ ο½
k k n n m n n
x k z p x m z p ο± ο± | , | , Compute the likelihood that a particular model generated a sample
Joint distribution
π π΅, πΆ = P B P(A|B)
>> pfg = 0.5; >> px_fg = normpdf(im, mu_fg, sigma_fg); >> px_bg = normpdf(im, mu_bg, sigma_bg); >> pfg_x = px_fg*pfg ./ (px_fg*pfg + px_bg*(1-pfg));
im fg: mu=0.6, sigma=0.1 bg: mu=0.4, sigma=0.1 Learned Parameters p(fg | im)
at once?
Foreground Background
ο¨ ο©
m m m n m
x ο° ο³ ο ο°ο³ ο ο· ο· οΈ οΆ ο§ ο§ ο¨ ο¦ ο ο ο½
2 2 2
2 exp 2 1
ο¨ ο©
ο¨ ο©
m m m n n n n
m z x p m z x p ο° ο³ ο , , | , , , | ,
2 2
ο½ ο½ ο½ Ο Ο ΞΌ
ο¨ ο© ο¨
ο©
m n m m n
m z p x p ο° ο³ ο | , |
2
ο½ ο½
mixture component
ο¨ ο©
ο¨ ο©
ο₯
ο½ ο½
m m m m n n n
m z x p x p ο° ο³ ο , , | , , , |
2 2 Ο
Ο ΞΌ
component prior component model parameters
With enough components, can represent any probability density function
Pixels come from one of several Gaussian components
Problem:
Gaussian Mixture Model. What would you do?
parameters
hidden variables
) , , , | (
) ( ) ( 2 ) ( t t t n n nm
x m z p Ο Ο ΞΌ ο½ ο½ ο‘
ο₯ ο₯
ο½
ο« n n nm n nm t m
x ο‘ ο‘ ο 1 Λ
) 1 (
ο¨ ο©
ο₯ ο₯
ο ο½
ο« n m n nm n nm t m
x
2 ) 1 ( 2
Λ 1 Λ ο ο‘ ο‘ ο³
N
n nm t m
ο₯
ο½
ο«
ο‘ ο°
) 1 (
Λ
Expectation Maximization (EM) Algorithm
ο¨ ο©ο·
οΈ οΆ ο§ ο¨ ο¦ ο½
z
z x ο± ο±
ο±
| , log argmax Λ p
Goal:
ο ο
ο¨ ο© ο¨ ο©
ο ο
X f X f E E ο³
Jensenβs Inequality Log of sums is intractable
See here for proof: www.stanford.edu/class/cs229/notes/cs229-notes8.ps
for concave functions f(x) (so we maximize the lower bound!)
Expectation Maximization (EM) Algorithm
ο¨ ο© ο¨ ο©
ο ο
ο¨ ο© ο¨ ο© ο¨
ο©
) ( , |
, | | , log | , log E
) (
t x z
p p p
t
ο± ο± ο±
ο±
x z z x z x
z
ο½
ο¨ ο© ο¨ ο© ο¨
ο©
) ( ) 1 (
, | | , log argmax
t t
p p ο± ο± ο±
ο±
x z z x
z
ο½
ο«
ο¨ ο©ο·
οΈ οΆ ο§ ο¨ ο¦ ο½
z
z x ο± ο±
ο±
| , log argmax Λ p
Goal:
ο¨ ο© ο¨ ο©
ο ο
ο¨ ο© ο¨ ο© ο¨
ο©
) ( , |
, | | , log | , log E
) (
t x z
p p p
t
ο± ο± ο±
ο±
x z z x z x
z
ο½
ο¨ ο© ο¨ ο© ο¨
ο©
) ( ) 1 (
, | | , log argmax
t t
p p ο± ο± ο±
ο±
x z z x
z
ο½
ο«
ο¨ ο©ο·
οΈ οΆ ο§ ο¨ ο¦ ο½
z
z x ο± ο±
ο±
| , log argmax Λ p
Goal:
ο ο
ο¨ ο© ο¨ ο©
ο ο
X f X f E E ο³
log of expectation of P(x|z) expectation of log of P(x|z)
EM for Mixture of Gaussians - derivation
ο¨ ο©
ο₯
ο ο· ο· οΈ οΆ ο§ ο§ ο¨ ο¦ ο ο ο½
m m m m n m
x ο° ο³ ο ο°ο³
2 2 2 exp
2 1
ο¨ ο©
ο¨ ο©
ο₯
ο½ ο½
m m m m n n n
m z x p x p ο° ο³ ο , , | , , , |
2 2 Ο
Ο ΞΌ
1. E-step: 2. M-step:
ο¨ ο© ο¨ ο© ο ο ο¨ ο© ο¨ ο© ο¨
ο©
) ( , |
, | | , log | , log E
) (
t x z
p p p
t
ο± ο± ο±
ο±
x z z x z x
z
ο₯
ο½
ο¨ ο© ο¨ ο© ο¨
ο©
) ( ) 1 (
, | | , log argmax
t t
p p ο± ο± ο±
ο±
x z z x
z
ο₯
ο½
ο«
EM for Mixture of Gaussians
ο¨ ο©
ο₯
ο ο· ο· οΈ οΆ ο§ ο§ ο¨ ο¦ ο ο ο½
m m m m n m
x ο° ο³ ο ο°ο³
2 2 2 exp
2 1
ο¨ ο©
ο¨ ο©
ο₯
ο½ ο½
m m m m n n n
m z x p x p ο° ο³ ο , , | , , , |
2 2 Ο
Ο ΞΌ
1. E-step: 2. M-step:
ο¨ ο© ο¨ ο© ο ο ο¨ ο© ο¨ ο© ο¨
ο©
) ( , |
, | | , log | , log E
) (
t x z
p p p
t
ο± ο± ο±
ο±
x z z x z x
z
ο₯
ο½
ο¨ ο© ο¨ ο© ο¨
ο©
) ( ) 1 (
, | | , log argmax
t t
p p ο± ο± ο±
ο±
x z z x
z
ο₯
ο½
ο«
) , , , | (
) ( ) ( 2 ) ( t t t n n nm
x m z p Ο Ο ΞΌ ο½ ο½ ο‘ ο₯ ο₯
ο½
ο« n n nm n nm t m
x ο‘ ο‘ ο 1 Λ
) 1 (
ο¨ ο©
ο₯ ο₯
ο ο½
ο« n m n nm n nm t m
x
2 ) 1 ( 2
Λ 1 Λ ο ο‘ ο‘ ο³
N
n nm t m
ο₯
ο½
ο«
ο‘ ο°
) 1 (
Λ
http://lasa.epfl.ch/teaching/lectures/ML_Phd/Notes/GP-GMM.pdf
Take derivative with respect to ππ
Take derivative with respect to Οπ
β1
variables