[PPT] - Rapid Computation of I-vector Longting XU 1,2 , Kong Aik LEE 1 , PowerPoint Presentation

SLIDE 1

Rapid Computation of I-vector

1Institute for Infocomm Research

(I2R), Singapore

2Nanjing University of Posts and

Telecomm, China

Longting XU 1,2, Kong Aik LEE 1, Haizhou Li 1 and Zhen Yang 2

SLIDE 2

Introduction

Compression process – an i-vector is a fixed-length low-dimensional

representation of a variable-length speech utterance [Dehak et al, 2011].

i-vector = speaker + recording device + transmission channel +

acoustic environment

MAP estimate – posterior mean of the latent variable x in a multi-Gaussian

factor analysis model (i.e., total variability model) The alignment of frames to Gaussian could be accomplished using GMM [Kenny et al, 2008] or senone posteriors [Lei et al, 2014].

Odyssey 2016, Bilbao, Spain 2

SLIDE 3

Introduction (cont’d)

I-vector extraction is a posterior inference process
We use a pre-whiten statistics [Matejka et al, 2011] in this work

 

, x 0 I

Prior Observations Posterior

 

1 2

, , ,

T

o


 

1

, 



x L

1 T 1



 

  L T Σ F

 

1 1 T 1   

  L I T Σ NT

1 T





  L T F

 

1 1 T   

 L I T NT

Odyssey 2016, Bilbao, Spain 3

SLIDE 4

Objective

Objective:
Why?

– Of particularly interest for i-vector extraction on hand-held devices and large- scale cloud-based applications – The number of senone posteriors is approaching 10k and beyond [Sadjadi et al, 2016]. – The T matrix is trained offline and one-off. Computational load not general seen as a bottleneck.

To reduce the computation complexity of i-vector extraction while keeping the memory requirement low with slight degradation in performance.

Odyssey 2016, Bilbao, Spain 4

SLIDE 5

Problem statement

The main computational load of i-vector extraction is at the computation of

the posterior covariance as part of the posterior mean estimation.

Existing solutions:

– Simplifying the posterior covariance estimation

Eigen decomposition of posterior covariance [Glembek et al, 2011]
Fixed occupancy count [Aronowitz et al, 2012]
Factorized subspace [Cumani et al, 2014]
Sparse coding [Xu et al, 2015]

Odyssey 2016, Bilbao, Spain 5

SLIDE 6

Problem statement (cont’d)

Proposed solution:

– Estimate directly the posterior mean without the need to evaluate the posterior covariance

Using informative prior
Uniform occupancy assumption

Odyssey 2016, Bilbao, Spain 6

SLIDE 7

INFORMATIVE PRIOR

I-vector extraction using

Odyssey 2016, Bilbao, Spain 7

SLIDE 8

Posterior inference with informative prior

Conventional i-vector extraction assumes a standard Gaussian prior on the

latent variable x

Consider a more general case where the prior on x has mean 𝛎p and

covariance 𝚻p

I-vector extraction with informative prior

Odyssey 2016, Bilbao, Spain 8

 

, x 0 I

 

p p

, x μ Σ

SLIDE 9

Subspace orthonormalizing prior

In this work, we consider the following informative prior
I-vector extraction

Odyssey 2016, Bilbao, Spain 9

   

1 T p p

with ,



 x Σ Σ T T

1 T





  L T F

1 1 T T  

      L T T T NT

   

 

   

1 T T T 1 1 T T T T 1 1 1 T T T T



     

                     T T T NT T F T T I T T T NT T F I T T T NT T T T F

SLIDE 10

Subspace orthonormalizing prior (cont’d)

Using the matrix inversion identity
I-vector extraction with subspace orthonormalizing prior:

Odyssey 2016, Bilbao, Spain 10

   

1 1 1 T T T T



  

       I T T T NT T T T F P Q P Projection matrix with

rthonormal columns

 

1 T T T 1 1 

 T T T T U U

   

1 1 1 T T T T



  

      T T T I NT T T T F

SLIDE 11

RAPID COMPUTATION

I-vector extraction with

Odyssey 2016, Bilbao, Spain 11

SLIDE 12

Solving the matrix inversion

Singular value decomposition of T
It follows that U1 spans the same subspace as T, and U1 ⊥ U2
Using the above, we solve for the following matric inversion

Odyssey 2016, Bilbao, Spain 12

 

1 2

,   T USV U U SV

     

1 1 1 T T T 1 1 1 T 2 2 1 T 2 2     

                          I N T T T T I N U U I N I U U I N NU U

SLIDE 13

Solving the matrix inversion (cont’d)

Let A = (I + N)
Using the matrix inversion lemma
Using again the matrix inversion identity

Odyssey 2016, Bilbao, Spain 13

   

1 1 T 1 1 T 1 T 1 2 2 2 2 2 2      

    A NU U A A N I U U A N U U A

     

1 1 1 1 T T T T 2 2 2 2    

             I N T T T T I N NU U A NU U

   

1 1 1 T T 1 1 T 1 T 1 2 2 2 2       

           I N T T T T A A NU U I A NU U A

SLIDE 14

Rapid computation of i-vector

Uniform occupancy assumption:
Or equivalently
The matrix inversion can be simplified as
Since T ⊥ U2 , the second term diminishes

Odyssey 2016, Bilbao, Spain 14

 

1 1

for 0 1  

 

     A N I N N I 1

c c

N c N    

   

1 1 1 T T 1 T 1 T 1 2 2 2 2



     

            I N T T T T A U U I A NU U A

     

 

1 1 1 1 1 T T T T T T



    

        T T T I NT T T T F T T T I N F

SLIDE 15

Computational complexity and memory cost

Odyssey 2016, Bilbao, Spain 15

Complexity Memory cost Time ratio Baseline (slow) O(CFM2 + M3) O(CFM) 106.44 Baseline (fast) O(CFM + CM2 + M3) O(CFM + CM2) 11.99 Proposed (exact) O(CFM + CM2 + M3) O(CFM + CM2) 12.65 Proposed (fast) O(CFM) O(CFM) 1

Baseline (slow) Baseline (fast) Proposed (exact)

 

1 T T c c c c N





  



I T T T F

 

1 T c c c N





  



I A T F

 

1 T T

1

c c c c N





  



T T T F

 

1 1 T T



 

  T T T I N F

Proposed (fast)

SLIDE 16

Posterior covariance

Determined by the zero-order statistics and T matrix.
Might be desired for uncertainty propagation.
Using the same derivation procedure as for the posterior mean.
The computational complexity is O(CM2), assuming that we pre-computed

the matrices .

Odyssey 2016, Bilbao, Spain 16

T c c

T T

SLIDE 17

Using informative prior in EM update of T

Odyssey 2016, Bilbao, Spain 17

SLIDE 18

EXPERIMENT

Rapid computation of i-vector

Odyssey 2016, Bilbao, Spain 18

SLIDE 19

Experimental setup

NIST SRE’10 extended core task, CCs 1 to 9
UBM

– Gender dependent with C = 512 mixtures – 57-dim MFCC – SWB, SRE’04, 05, 06

T matrix

– M = 400 – Trained using the same dataset as UBM

PLDA

– LDA to 300-dim and length normalization was performed – 200 speaker factors – Full residual covariance for channel modeling

Odyssey 2016, Bilbao, Spain 19

SLIDE 20

SRE’10 core-extended (female)

Introducing an informative prior does not seem to degrade the

performance.

For the tel-tel CC 5, the relative degradation is 10.04% in EER and 4.54% in

min DCF.

For all the 9 CCs, relative degradation ranges from 10.04% to 16.11% in EER

and 0.67% to 20.40% in min DCF.

Odyssey 2016, Bilbao, Spain 20

EER MinDCF10

SLIDE 21

SRE’10 core-extended (female)

We trained the T matrix assuming a subspace-orthonormalizing prior.
Comparing to the results using T trained with standard Gaussian prior, a

slightly better results could be observed.

Odyssey 2016, Bilbao, Spain 21

EER MinDCF10

SLIDE 22

Conclusion

We introduced the following for Rapid Computation of I-vector

– Subspace orthonomalizing prior – Uniform occupancy assumption

The computational speed-up is attained by avoiding the need to compute

the posterior covariance in order to compute the posterior mean.

The proposed method speed up the i-vector extraction by a factor of 12

compared to fast baseline (and a factor of 106 compared to the slow baseline) with marginal degradation in recognition accuracy.

Odyssey 2016, Bilbao, Spain 22

SLIDE 23

QUESTION?

Thanks for Your Attention!

Odyssey 2016, Bilbao, Spain 23