Processing Independent Component Analysis Class 8. 23 Sep 2013 - - PowerPoint PPT Presentation

processing
SMART_READER_LITE
LIVE PREVIEW

Processing Independent Component Analysis Class 8. 23 Sep 2013 - - PowerPoint PPT Presentation

Machine Learning for Signal Processing Independent Component Analysis Class 8. 23 Sep 2013 Instructor: Bhiksha Raj 23 Sep 2013 11755/18797 1 Correlation vs. Causation The consumption of burgers has gone up steadily in the past decade


slide-1
SLIDE 1

Machine Learning for Signal Processing Independent Component Analysis

Class 8. 23 Sep 2013 Instructor: Bhiksha Raj

23 Sep 2013 11755/18797 1

slide-2
SLIDE 2

Correlation vs. Causation

  • The consumption of burgers has gone up

steadily in the past decade

  • In the same period, the penguin population of

Antarctica has gone down

23 Sep 2013 11755/18797 2

slide-3
SLIDE 3

The concept of correlation

  • Two variables are correlated if knowing the

value of one gives you information about the expected value of the other

23 Sep 2013 11755/18797 3

Burger consumption Penguin population Time

slide-4
SLIDE 4

The statistical concept of correlatedness

  • Two variables X and Y are correlated if If

knowing X gives you an expected value of Y

  • X and Y are uncorrelated if knowing X tells you

nothing about the expected value of Y

– Although it could give you other information – How?

23 Sep 2013 11755/18797 4

slide-5
SLIDE 5

A brief review of basic probability

  • Uncorrelated: Two random variables X and Y are

uncorrelated iff:

– The average value of the product of the variables equals the product of their individual averages

  • Setup: Each draw produces one instance of X and one

instance of Y

– I.e one instance of (X,Y)

  • E[XY] = E[X]E[Y]
  • The average value of X is the same regardless of the value
  • f Y

23 Sep 2013 11755/18797 5

slide-6
SLIDE 6

Uncorrelatedness

  • Which of the above represent uncorrelated RVs?

23 Sep 2013 11755/18797 6

slide-7
SLIDE 7

The statistical concept of Independence

  • Two variables X and Y are dependent if If

knowing X gives you any information about Y

  • X and Y are independent if knowing X tells you

nothing at all of Y

23 Sep 2013 11755/18797 7

slide-8
SLIDE 8

A brief review of basic probability

  • Independence: Two random variables X and Y are

independent iff:

– Their joint probability equals the product of their individual probabilities

  • P(X,Y) = P(X)P(Y)
  • Independence implies uncorrelatedness

– The average value of X is the same regardless of the value of Y

  • E[X|Y] = E[X]

– But not the other way

23 Sep 2013 11755/18797 8

slide-9
SLIDE 9

A brief review of basic probability

  • Independence: Two random variables X and Y

are independent iff:

  • The average value of any function of X is the

same regardless of the value of Y

– Or any function of Y

  • E[f(X)g(Y)] = E[f(X)] E[g(Y)] for all f(), g()

23 Sep 2013 11755/18797 9

slide-10
SLIDE 10

Independence

  • Which of the above represent independent RVs?
  • Which represent uncorrelated RVs?

23 Sep 2013 11755/18797 10

slide-11
SLIDE 11

A brief review of basic probability

  • The expected value of an odd function of an

RV is 0 if

– The RV is 0 mean – The PDF is of the RV is symmetric around 0

  • E[f(X)] = 0 if f(X) is odd symmetric

23 Sep 2013 11755/18797 11

y = f(x) p(x)

slide-12
SLIDE 12

A brief review of basic info. theory

  • Entropy: The minimum average number of bits

to transmit to convey a symbol

  • Joint entropy: The minimum average number of

bits to convey sets (pairs here) of symbols

23 Sep 2013 11755/18797 12

T(all), M(ed), S(hort)… T, M, S… M F F M..

 

X

X P X P X H )] ( log )[ ( ) (

 

Y X

Y X P Y X P Y X H

,

)] , ( log )[ , ( ) , ( X Y

slide-13
SLIDE 13

A brief review of basic info. theory

  • Conditional Entropy: The minimum average

number of bits to transmit to convey a symbol X, after symbol Y has already been conveyed

– Averaged over all values of X and Y

23 Sep 2013 11755/18797 13

T, M, S… M F F M.. X Y

  

   

Y X Y X

Y X P Y X P Y X P Y X P Y P Y X H

,

)] | ( log )[ , ( )] | ( log )[ | ( ) ( ) | (

slide-14
SLIDE 14

A brief review of basic info. theory

  • Conditional entropy of X = H(X) if X is

independent of Y

  • Joint entropy of X and Y is the sum of the

entropies of X and Y if they are independent

23 Sep 2013 11755/18797 14

) ( )] ( log )[ ( ) ( )] | ( log )[ | ( ) ( ) | ( X H X P X P Y P Y X P Y X P Y P Y X H

Y X Y X

    

     

   

Y X Y X

Y P X P Y X P Y X P Y X P Y X H

, ,

)] ( ) ( log )[ , ( )] , ( log )[ , ( ) , ( ) ( ) ( ) ( log ) , ( ) ( log ) , (

, ,

Y H X H Y P Y X P X P Y X P

Y X Y X

     

slide-15
SLIDE 15

Onward..

23 Sep 2013 11755/18797 15

slide-16
SLIDE 16

Projection: multiple notes

23 Sep 2013 11755/18797 16

 P = W (WTW)-1 WT  Projected Spectrogram = PM

M = W =

slide-17
SLIDE 17

We’re actually computing a score

23 Sep 2013 11755/18797 17

 M ~ WH  H = pinv(W)M

M = W = H = ?

slide-18
SLIDE 18

So what are we doing here?

  • M ~ WH is an approximation
  • Given W, estimate H to minimize error
  • Must ideally find transcription of given notes

 



   

i j ij ij F 2 2

) ( min arg || || min arg H W M H W M H

H H

W = H = ? ?

23 Sep 2013 11755/18797 18

slide-19
SLIDE 19

How about the other way?

23 Sep 2013 11755/18797 19

 M ~ WH W = Mpinv(H) U = WH

M = W =

? ?

H = U =

slide-20
SLIDE 20

Going the other way..

  • M ~ WH is an approximation
  • Given H, estimate W to minimize error
  • Must ideally find the notes corresponding to the

transcription

 



   

i j ij ij F 2 2

) ( min arg || || min arg H W M H W M W

H W

W =? H ?

23 Sep 2013 11755/18797 20

slide-21
SLIDE 21

When both parameters are unknown

  • Must estimate both H and W to best

approximate M

  • Ideally, must learn both the notes and their

transcription!

W =? H = ? approx(M) = ?

23 Sep 2013 11755/18797 21

slide-22
SLIDE 22

A least squares solution

  • Unconstrained

– For any W, H that minimizes the error, W’=WA, H’=A-1H also minimizes the error for any invertible A

  • Too many solutions

2 ,

|| || min arg ,

F

H W M H W

H W

 

H

23 Sep 2013 11755/18797 22

slide-23
SLIDE 23

A constrained least squares solution

  • For our problem, lets consider the “truth”..
  • When one note occurs, the other does not

– hi

Thj = 0 for all i != j

  • The rows of H are uncorrelated

2 ,

|| || min arg ,

F

H W M H W

H W

 

H

slide-24
SLIDE 24

A least squares solution

  • Assume: HHT = I

– Normalizing all rows of H to length 1

  • pinv(H) = HT
  • Projecting M onto H

– W = M pinv(H) = MHT – WH = M HTH

H

2 ,

|| || min arg ,

F

H W M H W

H W

 

2

|| || min arg

F TH

H M M H

H

 

Constraint: Rank(H) = 4

slide-25
SLIDE 25

Finding the notes

  • Add the constraint: HHT = I
  • The solution is obtained through Eigen

decomposition

  • Note: we are considering the correlation of

MT

23 Oct 2012 11755/18797 26

  H H M ) (

T

n Correlatio

 

T F T

H H H H M M H

H

   

2

|| || min arg

slide-26
SLIDE 26

So how does that work?

  • There are 12 notes in the segment, hence we

try to estimate 12 notes..

23 Oct 2012 11755/18797 28

slide-27
SLIDE 27

So how does that work?

  • The scores of the first three “notes” and their

contributions

23 Oct 2012 11755/18797 29

slide-28
SLIDE 28

Finding the notes

  • Can find W instead of H
  • Assume the columns of W are orthogonal
  • This results in the more conventional Eigen

decomposition

23 Oct 2012 11755/18797 30

  W W M) ( n Correlatio

2

|| || min arg

F T

M W W M W

W

 

slide-29
SLIDE 29

So how does that work?

  • There are 12 notes in the segment, hence we

try to estimate 12 notes..

  • Results are not good again

23 Oct 2012 11755/18797 31

slide-30
SLIDE 30

Our notes are not orthogonal

  • Overlapping frequencies
  • Note occur concurrently

– Harmonica continues to resonate to previous note

  • More generally, simple orthogonality will not give

us the desired solution

23 Sep 2013 11755/18797 32

slide-31
SLIDE 31

Eigendecomposition and SVD

  • Matrix M can be decomposed as M = USVT
  • When we assume the scores are orthogonal, we get

H = VT, W = US

  • When we assume the notes are orthogonal, we get

W = U, H = SVT

  • In either case the results are the same

– The notes are orthogonal and so are the scores – Not good in our problem

23 Sep 2013 11755/18797 34

T

USV M  WH M 

slide-32
SLIDE 32

Orthogonality

  • In any least-squared error decomposition

M=WH, if the columns of W are orthogonal, the rows of H will also be orthogonal

  • Sometimes mere orthogonality is not enough

23 Sep 2013 11755/18797 35

WH M 

slide-33
SLIDE 33

What else can we look for?

  • Assume: The “transcription” of one note does

not depend on what else is playing

– Or, in a multi-instrument piece, instruments are playing independently of one another

  • Not strictly true, but still..

23 Sep 2013 11755/18797 36

slide-34
SLIDE 34

Formulating it with Independence

  • Impose statistical independence constraints
  • n decomposition

23 Sep 2013 11755/18797 37

) . . . . ( || || min arg ,

2 ,

t independen are H

  • f

rows

F

    H W M H W

H W

slide-35
SLIDE 35

Changing problems for a bit

  • Two people speak simultaneously
  • Recorded by two microphones
  • Each recorded signal is a mixture of both signals

23 Sep 2013 11755/18797 38

) ( ) ( ) (

2 12 1 11 1

t h w t h w t m   ) ( ) ( ) (

2 22 1 21 2

t h w t h w t m  

) (

1 t

h ) (

2 t

h

slide-36
SLIDE 36

A Separation Problem

  • M = WH

– M = “mixed” signal – W = “notes” – H = “transcription”

  • Separation challenge: Given only M estimate H
  • Identical to the problem of “finding notes”

23 Sep 2013 11755/18797 39

=

M H W w11 w12 w21 w22 Signal at mic 1 Signal at mic 2 Signal from speaker 1 Signal from speaker 2

slide-37
SLIDE 37

Imposing Statistical Constraints

  • M = WH
  • Given only M estimate H
  • H = W-1M = AM
  • Only known constraint: The rows of H are

independent

  • Estimate A such that the components of AM are

statistically independent

– A is the unmixing matrix

23 Sep 2013 11755/18797 40

=

M H W w11 w12 w21 w22

slide-38
SLIDE 38

Statistical Independence

  • M = WH H = AM
  • Emulating independence

– Compute W (or A) and H such that H has statistical characteristics that are observed in statistically independent variables

  • Enforcing independence

– Compute W and H such that the components of M are independent

23 Sep 2013 11755/18797 41

Remember this form

slide-39
SLIDE 39

Emulating Independence

  • The rows of H are uncorrelated

– E[hihj] = E[hi]E[hj] – hi and hj are the ith and jth components of any vector in H

  • The fourth order moments are independent

– E[hihjhkhl] = E[hi]E[hj]E[hk]E[hl] – E[hi

2hjhk] = E[hi 2]E[hj]E[hk]

– E[hi

2hj 2] = E[hi 2]E[hj 2]

– Etc.

23 Sep 2013 11755/18797 42

H

slide-40
SLIDE 40

Zero Mean

  • Usual to assume zero mean processes

– Otherwise, some of the math doesn’t work well

  • M = WH H = AM
  • If mean(M) = 0 => mean(H) = 0

– E[H] = A.E[M] = A0 = 0 – First step of ICA: Set the mean of M to 0 – mi are the columns of M

23 Sep 2013 11755/18797 43

i i

cols m M

m

) ( 1 

i

i i

  

m

m m 

slide-41
SLIDE 41

Emulating Independence..

  • Independence  Uncorrelatedness
  • Estimate a C such that CM is uncorrelated
  • X = CM

– E[xixj] = E[xi]E[xj] = dij [since M is now “centered”] – XXT = I

  • In reality, we only want this to be a diagonal matrix, but we’ll

make it identity

23 Sep 2013 11755/18797 44

H H’

=

Diagonal + rank1 matrix H=AM H=BCM A=BC

slide-42
SLIDE 42

Decorrelating

  • X = CM
  • XXT = I
  • Eigen decomposition MMT= ESET
  • Let C = S-1/2ET

– X = S-1/2ETM – XXT = WMMTWT = S-1/2ET ESETES-1/2 = I

23 Sep 2013 11755/18797 45

H H’

=

Diagonal + rank1 matrix H=AM H=BCM A=BC

slide-43
SLIDE 43

Decorrelating

  • X = CM
  • XXT = I
  • Eigen decomposition MMT= ESET
  • Let C = S-1/2ET

– X = S-1/2ETM – WMMTWT = S-1/2ET ESETES-1/2 = I

  • X is called the whitened version of M

– The process of decorrelating M is called whitening – C is the whitening matrix

23 Sep 2013 11755/18797 46

H H’

=

Diagonal + rank1 matrix H=AM H=BCM A=BC

slide-44
SLIDE 44

Uncorrelated != Independent

  • Whitening merely ensures that the resulting shat

signals are uncorrelated, i.e. E[xixj] = 0 if i != j

  • This does not ensure higher order moments are also

decoupled, e.g. it does not ensure that E[xi

2xj 2] = E[xi 2]E [xj 2]

  • This is one of the signatures of independent RVs
  • Lets explicitly decouple the fourth order moments

47 23 Sep 2013 11755/18797

slide-45
SLIDE 45

Decorrelating

  • X = CM
  • XXT = I
  • Will multiplying X by B re-correlate the components?
  • Not if B is unitary

– BBT = BTB = I

  • HHT = BXXTBT = BBT = I
  • So we want to find a unitary matrix

– Since the rows of H are uncorrelated

  • Because they are independent

23 Sep 2013 11755/18797 48

H H’

=

Diagonal + rank1 matrix H=AM H=BX A=BC

slide-46
SLIDE 46

ICA: Freeing Fourth Moments

  • We have E[xi xj] = 0 if i != j

– Already been decorrelated

  • A=BC, H = BCM, X = CM,  H = BX
  • The fourth moments of H have the form:

E[hi hj hk hl]

  • If the rows of H were independent

E[hi hj hk hl] = E[hi] E[hj] E[hk] E[hl]

  • Solution: Compute B such that the fourth moments of H = BX

are decoupled

– While ensuring that B is Unitary

49 23 Sep 2013 11755/18797

slide-47
SLIDE 47

ICA: Freeing Fourth Moments

  • Create a matrix of fourth moment terms that would be

diagonal were the rows of H independent and diagonalize it

  • A good candidate

– Good because it incorporates the energy in all rows of H – Where dij = E[ Sk hk

2 hi hj]

– i.e. D = E[hTh h hT]

  • h are the columns of H
  • Assuming h is real, else replace transposition with Hermition

50

         .. .. .. .. .. ..

23 22 21 13 12 11

d d d d d d D

23 Sep 2013 11755/18797

slide-48
SLIDE 48

ICA: The D matrix

  • Average above term across all columns of H

51

         .. .. .. .. .. ..

23 22 21 13 12 11

d d d d d d D

dij = E[ Sk hk

2 hi hj] = mj mi m k mk

h h h cols



2

) ( 1 H

jth component jth component Sum of squares

  • f all components

Shk

2

hi hj Shk

2hi hj 23 Sep 2013 11755/18797

slide-49
SLIDE 49

ICA: The D matrix

  • If the hi terms were independent

– For i != j – Centered: E[hj] = 0  E[ Sk hk

2 hi hj]=0 for i != j

– For i = j

  • Thus, if the hi terms were independent, dij = 0 if i != j
  • i.e., if hi were independent, D would be a diagonal matrix

– Let us diagonalize D

52

         .. .. .. .. .. ..

23 22 21 13 12 11

d d d d d d D

dij = E[ Sk hk

2 hi hj] =

             

 

 

        

j k i k j i k i j j i k j i k

E E E E E E E E

, 2 3 3 2

h h h h h h h h h h

     

2 2 4 2

        

 

i k k i i k j i k

E E E E h h h h h h

mj mi m k mk

h h h cols



2

) ( 1 H

23 Sep 2013 11755/18797

slide-50
SLIDE 50

Diagonalizing D

  • Compose a fourth order matrix from X

– Recall: X = CM, H = BX = BCM

  • B is what we’re trying to learn to make H independent

– Compose D’ = E[xT x x xT]

  • Diagonalize D’ via Eigen decomposition

D’ = UUT

  • B = UT

– That’s it!!!!

23 Sep 2013 11755/18797 53

slide-51
SLIDE 51

B frees the fourth moment

D’ = UUT ; B = UT

  • U is a unitary matrix, i.e. UTU = UUT = I (identity)
  • H = BX = UTX
  • h = UTx
  • The fourth moment matrix of H is

E[hT h h hT] = E[xTUUTx UT x xTU] = E[xTx UT x xTU] = UT E[xTx xxT]U = UT D’ U = UT U  U T U = 

  • The fourth moment matrix of H = UTX is Diagonal!!

54 23 Sep 2013 11755/18797

slide-52
SLIDE 52

Overall Solution

  • H = AM = BCM

– C is the (transpose of the) matrix of Eigen vectors of MMT

  • X = CM
  • A = BC = UTC

– B is the (transpose of the) matrix of Eigenvectors of X.diag(XTX).XT

55 23 Sep 2013 11755/18797

slide-53
SLIDE 53

Independent Component Analysis

  • Goal: to derive a matrix A such that the rows of AM are

independent

  • Procedure:
  • 1. “Center” M
  • 2. Compute the autocorrelation matrix RMM of M
  • 3. Compute whitening matrix C via Eigen decomposition

RMM = ESET, C = S-1/2ET

  • 4. Compute X = CM
  • 5. Compute the fourth moment matrix D’ = E[xTxxxT]
  • 6. Diagonalize D’ via Eigen decomposition
  • 7. D’ = UUT
  • 8. Compute A = UTC
  • The fourth moment matrix of H=AM is diagonal

– Note that the autocorrelation matrix of H will also be diagonal

56 23 Sep 2013 11755/18797

slide-54
SLIDE 54

ICA by diagonalizing moment matrices

  • The procedure just outlined, while fully functional, has

shortcomings

– Only a subset of fourth order moments are considered – There are many other ways of constructing fourth-order moment matrices that would ideally be diagonal

  • Diagonalizing the particular fourth-order moment matrix we have chosen

is not guaranteed to diagonalize every other fourth-order moment matrix

  • JADE: (Joint Approximate Diagonalization of Eigenmatrices),

J.F. Cardoso

– Jointly diagonalizes several fourth-order moment matrices – More effective than the procedure shown, but computationally more expensive

57 23 Sep 2013 11755/18797

slide-55
SLIDE 55

Enforcing Independence

  • Specifically ensure that the components of H are

independent

– H = AM

  • Contrast function: A non-linear function that has a

minimum value when the output components are independent

  • Define and minimize a contrast function

» F(AM)

  • Contrast functions are often only approximations too..

23 Sep 2013 11755/18797 58

slide-56
SLIDE 56

A note on pre-whitening

  • The mixed signal is usually “prewhitened”

– Normalize variance along all directions – Eliminate second-order dependence

  • Eigen decomposition MMT = ESET
  • C = S-1/2ET
  • Can use first K columns of E only if only K independent

sources are expected

– In microphone array setup – only K < M sources

  • X = CM

– E[xixj] = E[xi]E[xj] = dij for centered signal

23 Sep 2013 11755/18797 59

slide-57
SLIDE 57

The contrast function

  • Contrast function: A non-linear function that

has a minimum value when the output components are independent

  • An explicit contrast function
  • With constraint : H = BX

– X is “whitened” M

23 Sep 2013 11755/18797 60

) ( ) ( ) ( h h H H H I

i i

 

slide-58
SLIDE 58

Linear Functions

  • h = Bx

– Individual columns of the H and X matrices – x is mixed signal, B is the unmixing matrix

23 Sep 2013 11755/18797 61

1

| | ) ( ) (

 

 B h B h

1 x h

P P

 x x x x d P P H ) ( log ) ( ) ( | | log ) ( ) ( B x h   H H

slide-59
SLIDE 59

The contrast function

  • Ignoring H(x) (Const)
  • Minimize the above to obtain B

23 Sep 2013 11755/18797 62

) ( ) ( ) ( H h H H H I

i i

  | | log ) ( ) ( ) ( B x h H    H H I

i i

| | log ) ( ) ( B h H

 

i i

H J

slide-60
SLIDE 60

An alternate approach

  • Recall PCA
  • M = WH, the columns of W must be

uncorrelated

  • Leads to: minW||M –WTWM||2+trace(WWT)

– Error minimization framework to estimate W

  • Can we arrive at an error minimization

framework for ICA

  • Define an “Error” objective that represents

independence

23 Sep 2013 11755/18797 63

slide-61
SLIDE 61

An alternate approach

  • Definition of Independence – if x and y are

independent:

– E[f(x)g(y)] = E[f(x)]E[g(y)] – Must hold for every f() and g()!!

23 Sep 2013 11755/18797 64

slide-62
SLIDE 62

An alternate approach

  • Define g(H) = g(BX) (component-wise

function)

  • Define f(H) = f(BX)

23 Sep 2013 11755/18797 65

g(h11) g(h12) . . . g(h21) g(h22) . . . . . . . . . . . . . . . f(h11) f(h12) f(h21) f(h22)

slide-63
SLIDE 63

An alternate approach

  • P = g(H) f(H)T = g(BX) f(BX)T

This is a square matrix

  • Must ideally be
  • Error = ||P-Q||F

2

23 Sep 2013 11755/18797 66

P11 P12 . . . P21 P22 . . . . . .

k jk ik ij

h f h g P ) ( ) (

. . .

P =

Q11 Q12 . . . Q21 Q22 . . .

Q =

j i h f h g Q

k l jl ik ij

 

) ( ) (

k il ik ii

h f h g Q ) ( ) (

slide-64
SLIDE 64

An alternate approach

  • Ideal value for Q
  • If g() and h() are odd symmetric functions

Sjg(hij) = 0 for all i

– Since = Sjhij = 0 (H is centered) – Q is a Diagonal Matrix!!!

23 Sep 2013 11755/18797 67

. . . Q11 Q12 . . . Q21 Q22 . . .

Q =

j i h f h g Q

k l jl ik ij

 

) ( ) (

k il ik ii

h f h g Q ) ( ) (

slide-65
SLIDE 65

An alternate approach

  • Minimize Error
  • Leads to trivial Widrow Hopf type iterative

rule:

23 Sep 2013 11755/18797 68

Diagonal  Q

T

g(BX)f(BX) P 

2

|| ||

F

error Q P  

T

g(BX)f(BX) E   Diag

T

EB B B   

slide-66
SLIDE 66

Update Rules

  • Multiple solutions under different

assumptions for g() and f()

  • H = BX
  • B = B +  DB
  • Jutten Herraut : Online update

– DBij = f(hi)g(hj); -- actually assumed a recursive neural network

  • Bell Sejnowski

– DB = ([BT]-1 – g(H)XT)

23 Sep 2013 11755/18797 69

slide-67
SLIDE 67

Update Rules

  • Multiple solutions under different

assumptions for g() and f()

  • H = BX
  • B = B +  DB
  • Natural gradient -- f() = identity function

– DB = (I – g(H)HT)W

  • Cichoki-Unbehaeven

– DB = (I – g(H)f(H)T)W

23 Sep 2013 11755/18797 70

slide-68
SLIDE 68

What are G() and H()

  • Must be odd symmetric functions
  • Multiple functions proposed
  • Audio signals in general

– DB = (I – HHT-Ktanh(H)HT)W

  • Or simply

– DB = (I –Ktanh(H)HT)W

23 Sep 2013 11755/18797 71

      Gaussian sub is x ) tanh( Gaussian super is x ) tanh( ) ( x x x x x g

slide-69
SLIDE 69

So how does it work?

  • Example with instantaneous mixture of two

speakers

  • Natural gradient update
  • Works very well!

72 23 Sep 2013 11755/18797

slide-70
SLIDE 70

Another example!

Input Mix Output

23 Sep 2013 11755/18797 73

slide-71
SLIDE 71

Another Example

  • Three instruments..

23 Sep 2013 11755/18797 74

slide-72
SLIDE 72

The Notes

  • Three instruments..

23 Sep 2013 11755/18797 75

slide-73
SLIDE 73

ICA for data exploration

  • The “bases” in PCA

represent the “building blocks”

– Ideally notes

  • Very successfully used
  • So can ICA be used to

do the same?

23 Sep 2013 11755/18797 76

slide-74
SLIDE 74

ICA vs PCA bases

Non-Gaussian data ICA PCA

  • Motivation for using ICA vs PCA
  • PCA will indicate orthogonal directions
  • f maximal variance
  • May not align with the data!
  • ICA finds directions that are

independent

  • More likely to “align” with the data

23 Sep 2013 11755/18797 77

slide-75
SLIDE 75

Finding useful transforms with ICA

  • Audio preprocessing

example

  • Take a lot of audio snippets

and concatenate them in a big matrix, do component analysis

  • PCA results in the DCT bases
  • ICA returns time/freq

localized sinusoids which is a better way to analyze sounds

  • Ditto for images

– ICA returns localizes edge filters

23 Sep 2013 11755/18797 78

slide-76
SLIDE 76

Example case: ICA-faces vs. Eigenfaces

ICA-faces Eigenfaces

23 Sep 2013 11755/18797 79

slide-77
SLIDE 77

ICA for Signal Enhncement

  • Very commonly used to enhance EEG signals
  • EEG signals are frequently corrupted by

heartbeats and biorhythm signals

  • ICA can be used to separate them out

23 Sep 2013 11755/18797 80

slide-78
SLIDE 78

So how does that work?

  • There are 12 notes in the segment, hence we

try to estimate 12 notes..

23 Sep 2013 11755/18797 81

slide-79
SLIDE 79

PCA solution

  • There are 12 notes in the segment, hence we

try to estimate 12 notes..

23 Sep 2013 11755/18797 82

slide-80
SLIDE 80

So how does this work: ICA solution

  • Better..

– But not much

  • But the issues here?

23 Sep 2013 11755/18797 83

slide-81
SLIDE 81

ICA Issues

  • No sense of order

– Unlike PCA

  • Get K independent directions, but does not have a notion
  • f the “best” direction

– So the sources can come in any order – Permutation invariance

  • Does not have sense of scaling

– Scaling the signal does not affect independence

  • Outputs are scaled versions of desired signals in permuted
  • rder

– In the best case – In worse case, output are not desired signals at all..

23 Sep 2013 11755/18797 84

slide-82
SLIDE 82

What else went wrong?

  • Notes are not independent

– Only one note plays at a time – If one note plays, other notes are not playing

  • Will deal with these later in the course..

23 Sep 2013 11755/18797 85