Machine Learning for Signal Processing Independent Component Analysis
Class 8. 23 Sep 2013 Instructor: Bhiksha Raj
23 Sep 2013 11755/18797 1
Processing Independent Component Analysis Class 8. 23 Sep 2013 - - PowerPoint PPT Presentation
Machine Learning for Signal Processing Independent Component Analysis Class 8. 23 Sep 2013 Instructor: Bhiksha Raj 23 Sep 2013 11755/18797 1 Correlation vs. Causation The consumption of burgers has gone up steadily in the past decade
23 Sep 2013 11755/18797 1
23 Sep 2013 11755/18797 2
23 Sep 2013 11755/18797 3
Burger consumption Penguin population Time
23 Sep 2013 11755/18797 4
uncorrelated iff:
– The average value of the product of the variables equals the product of their individual averages
instance of Y
– I.e one instance of (X,Y)
23 Sep 2013 11755/18797 5
23 Sep 2013 11755/18797 6
23 Sep 2013 11755/18797 7
23 Sep 2013 11755/18797 8
23 Sep 2013 11755/18797 9
23 Sep 2013 11755/18797 10
23 Sep 2013 11755/18797 11
y = f(x) p(x)
23 Sep 2013 11755/18797 12
T(all), M(ed), S(hort)… T, M, S… M F F M..
X
X P X P X H )] ( log )[ ( ) (
Y X
Y X P Y X P Y X H
,
)] , ( log )[ , ( ) , ( X Y
23 Sep 2013 11755/18797 13
T, M, S… M F F M.. X Y
Y X Y X
Y X P Y X P Y X P Y X P Y P Y X H
,
)] | ( log )[ , ( )] | ( log )[ | ( ) ( ) | (
23 Sep 2013 11755/18797 14
) ( )] ( log )[ ( ) ( )] | ( log )[ | ( ) ( ) | ( X H X P X P Y P Y X P Y X P Y P Y X H
Y X Y X
Y X Y X
Y P X P Y X P Y X P Y X P Y X H
, ,
)] ( ) ( log )[ , ( )] , ( log )[ , ( ) , ( ) ( ) ( ) ( log ) , ( ) ( log ) , (
, ,
Y H X H Y P Y X P X P Y X P
Y X Y X
23 Sep 2013 11755/18797 15
23 Sep 2013 11755/18797 16
P = W (WTW)-1 WT Projected Spectrogram = PM
M = W =
23 Sep 2013 11755/18797 17
M ~ WH H = pinv(W)M
M = W = H = ?
i j ij ij F 2 2
H H
W = H = ? ?
23 Sep 2013 11755/18797 18
23 Sep 2013 11755/18797 19
M ~ WH W = Mpinv(H) U = WH
M = W =
H = U =
i j ij ij F 2 2
H W
W =? H ?
23 Sep 2013 11755/18797 20
W =? H = ? approx(M) = ?
23 Sep 2013 11755/18797 21
2 ,
F
H W
H
23 Sep 2013 11755/18797 22
Thj = 0 for all i != j
2 ,
F
H W
H
H
2 ,
F
H W
2
F TH
H
Constraint: Rank(H) = 4
23 Oct 2012 11755/18797 26
T
T F T
H
2
23 Oct 2012 11755/18797 28
23 Oct 2012 11755/18797 29
23 Oct 2012 11755/18797 30
2
F T
W
23 Oct 2012 11755/18797 31
23 Sep 2013 11755/18797 32
– The notes are orthogonal and so are the scores – Not good in our problem
23 Sep 2013 11755/18797 34
T
23 Sep 2013 11755/18797 35
23 Sep 2013 11755/18797 36
23 Sep 2013 11755/18797 37
2 ,
F
H W
23 Sep 2013 11755/18797 38
) ( ) ( ) (
2 12 1 11 1
t h w t h w t m ) ( ) ( ) (
2 22 1 21 2
t h w t h w t m
) (
1 t
h ) (
2 t
h
23 Sep 2013 11755/18797 39
M H W w11 w12 w21 w22 Signal at mic 1 Signal at mic 2 Signal from speaker 1 Signal from speaker 2
– A is the unmixing matrix
23 Sep 2013 11755/18797 40
M H W w11 w12 w21 w22
23 Sep 2013 11755/18797 41
Remember this form
– E[hihj] = E[hi]E[hj] – hi and hj are the ith and jth components of any vector in H
– E[hihjhkhl] = E[hi]E[hj]E[hk]E[hl] – E[hi
2hjhk] = E[hi 2]E[hj]E[hk]
– E[hi
2hj 2] = E[hi 2]E[hj 2]
– Etc.
23 Sep 2013 11755/18797 42
H
– Otherwise, some of the math doesn’t work well
– E[H] = A.E[M] = A0 = 0 – First step of ICA: Set the mean of M to 0 – mi are the columns of M
23 Sep 2013 11755/18797 43
i i
m
i i
m
make it identity
23 Sep 2013 11755/18797 44
H H’
=
Diagonal + rank1 matrix H=AM H=BCM A=BC
23 Sep 2013 11755/18797 45
H H’
=
Diagonal + rank1 matrix H=AM H=BCM A=BC
– X = S-1/2ETM – WMMTWT = S-1/2ET ESETES-1/2 = I
– The process of decorrelating M is called whitening – C is the whitening matrix
23 Sep 2013 11755/18797 46
H H’
=
Diagonal + rank1 matrix H=AM H=BCM A=BC
2xj 2] = E[xi 2]E [xj 2]
47 23 Sep 2013 11755/18797
– BBT = BTB = I
– Since the rows of H are uncorrelated
23 Sep 2013 11755/18797 48
H H’
=
Diagonal + rank1 matrix H=AM H=BX A=BC
– Already been decorrelated
E[hi hj hk hl]
E[hi hj hk hl] = E[hi] E[hj] E[hk] E[hl]
are decoupled
– While ensuring that B is Unitary
49 23 Sep 2013 11755/18797
diagonal were the rows of H independent and diagonalize it
– Good because it incorporates the energy in all rows of H – Where dij = E[ Sk hk
2 hi hj]
– i.e. D = E[hTh h hT]
50
.. .. .. .. .. ..
23 22 21 13 12 11
d d d d d d D
23 Sep 2013 11755/18797
51
.. .. .. .. .. ..
23 22 21 13 12 11
d d d d d d D
dij = E[ Sk hk
2 hi hj] = mj mi m k mk
h h h cols
2
) ( 1 H
jth component jth component Sum of squares
Shk
2
hi hj Shk
2hi hj 23 Sep 2013 11755/18797
– For i != j – Centered: E[hj] = 0 E[ Sk hk
2 hi hj]=0 for i != j
– For i = j
– Let us diagonalize D
52
.. .. .. .. .. ..
23 22 21 13 12 11
d d d d d d D
dij = E[ Sk hk
2 hi hj] =
j k i k j i k i j j i k j i k
E E E E E E E E
, 2 3 3 2
h h h h h h h h h h
2 2 4 2
i k k i i k j i k
E E E E h h h h h h
mj mi m k mk
h h h cols
2
) ( 1 H
23 Sep 2013 11755/18797
23 Sep 2013 11755/18797 53
E[hT h h hT] = E[xTUUTx UT x xTU] = E[xTx UT x xTU] = UT E[xTx xxT]U = UT D’ U = UT U U T U =
54 23 Sep 2013 11755/18797
– C is the (transpose of the) matrix of Eigen vectors of MMT
– B is the (transpose of the) matrix of Eigenvectors of X.diag(XTX).XT
55 23 Sep 2013 11755/18797
independent
RMM = ESET, C = S-1/2ET
– Note that the autocorrelation matrix of H will also be diagonal
56 23 Sep 2013 11755/18797
shortcomings
– Only a subset of fourth order moments are considered – There are many other ways of constructing fourth-order moment matrices that would ideally be diagonal
is not guaranteed to diagonalize every other fourth-order moment matrix
J.F. Cardoso
– Jointly diagonalizes several fourth-order moment matrices – More effective than the procedure shown, but computationally more expensive
57 23 Sep 2013 11755/18797
– H = AM
» F(AM)
23 Sep 2013 11755/18797 58
– Normalize variance along all directions – Eliminate second-order dependence
sources are expected
– In microphone array setup – only K < M sources
– E[xixj] = E[xi]E[xj] = dij for centered signal
23 Sep 2013 11755/18797 59
23 Sep 2013 11755/18797 60
i i
23 Sep 2013 11755/18797 61
1
1 x h
23 Sep 2013 11755/18797 62
i i
i i
i i
23 Sep 2013 11755/18797 63
23 Sep 2013 11755/18797 64
23 Sep 2013 11755/18797 65
g(h11) g(h12) . . . g(h21) g(h22) . . . . . . . . . . . . . . . f(h11) f(h12) f(h21) f(h22)
2
23 Sep 2013 11755/18797 66
P11 P12 . . . P21 P22 . . . . . .
k jk ik ij
. . .
Q11 Q12 . . . Q21 Q22 . . .
k l jl ik ij
k il ik ii
23 Sep 2013 11755/18797 67
. . . Q11 Q12 . . . Q21 Q22 . . .
k l jl ik ij
k il ik ii
23 Sep 2013 11755/18797 68
T
2
F
T
T
23 Sep 2013 11755/18797 69
23 Sep 2013 11755/18797 70
23 Sep 2013 11755/18797 71
72 23 Sep 2013 11755/18797
Input Mix Output
23 Sep 2013 11755/18797 73
23 Sep 2013 11755/18797 74
23 Sep 2013 11755/18797 75
23 Sep 2013 11755/18797 76
Non-Gaussian data ICA PCA
independent
23 Sep 2013 11755/18797 77
example
and concatenate them in a big matrix, do component analysis
localized sinusoids which is a better way to analyze sounds
– ICA returns localizes edge filters
23 Sep 2013 11755/18797 78
ICA-faces Eigenfaces
23 Sep 2013 11755/18797 79
23 Sep 2013 11755/18797 80
23 Sep 2013 11755/18797 81
23 Sep 2013 11755/18797 82
23 Sep 2013 11755/18797 83
– Unlike PCA
– So the sources can come in any order – Permutation invariance
– Scaling the signal does not affect independence
– In the best case – In worse case, output are not desired signals at all..
23 Sep 2013 11755/18797 84
23 Sep 2013 11755/18797 85