Rapid Computation of I-vector Longting XU 1,2 , Kong Aik LEE 1 , - - PowerPoint PPT Presentation

rapid computation of i vector
SMART_READER_LITE
LIVE PREVIEW

Rapid Computation of I-vector Longting XU 1,2 , Kong Aik LEE 1 , - - PowerPoint PPT Presentation

Rapid Computation of I-vector Longting XU 1,2 , Kong Aik LEE 1 , Haizhou Li 1 and Zhen Yang 2 1 Institute for Infocomm Research ( I 2 R ) , Singapore 2 Nanjing University of Posts and Telecomm, China Introduction Compression process an


slide-1
SLIDE 1

Rapid Computation of I-vector

1Institute for Infocomm Research

(I2R), Singapore

2Nanjing University of Posts and

Telecomm, China

Longting XU 1,2, Kong Aik LEE 1, Haizhou Li 1 and Zhen Yang 2

slide-2
SLIDE 2

Introduction

  • Compression process – an i-vector is a fixed-length low-dimensional

representation of a variable-length speech utterance [Dehak et al, 2011].

  • i-vector = speaker + recording device + transmission channel +

acoustic environment

  • MAP estimate – posterior mean of the latent variable x in a multi-Gaussian

factor analysis model (i.e., total variability model) The alignment of frames to Gaussian could be accomplished using GMM [Kenny et al, 2008] or senone posteriors [Lei et al, 2014].

Odyssey 2016, Bilbao, Spain 2

slide-3
SLIDE 3

Introduction (cont’d)

  • I-vector extraction is a posterior inference process
  • We use a pre-whiten statistics [Matejka et al, 2011] in this work

 

, x 0 I

Prior Observations Posterior

 

1 2

, , ,

T

  • o

 

1

, 

x L

1 T 1

 

  L T Σ F

 

1 1 T 1   

  L I T Σ NT

1 T

  L T F

 

1 1 T   

 L I T NT

Odyssey 2016, Bilbao, Spain 3

slide-4
SLIDE 4

Objective

  • Objective:
  • Why?

– Of particularly interest for i-vector extraction on hand-held devices and large- scale cloud-based applications – The number of senone posteriors is approaching 10k and beyond [Sadjadi et al, 2016]. – The T matrix is trained offline and one-off. Computational load not general seen as a bottleneck.

To reduce the computation complexity of i-vector extraction while keeping the memory requirement low with slight degradation in performance.

Odyssey 2016, Bilbao, Spain 4

slide-5
SLIDE 5

Problem statement

  • The main computational load of i-vector extraction is at the computation of

the posterior covariance as part of the posterior mean estimation.

  • Existing solutions:

– Simplifying the posterior covariance estimation

  • Eigen decomposition of posterior covariance [Glembek et al, 2011]
  • Fixed occupancy count [Aronowitz et al, 2012]
  • Factorized subspace [Cumani et al, 2014]
  • Sparse coding [Xu et al, 2015]

Odyssey 2016, Bilbao, Spain 5

slide-6
SLIDE 6

Problem statement (cont’d)

  • Proposed solution:

– Estimate directly the posterior mean without the need to evaluate the posterior covariance

  • Using informative prior
  • Uniform occupancy assumption

Odyssey 2016, Bilbao, Spain 6

slide-7
SLIDE 7

INFORMATIVE PRIOR

I-vector extraction using

Odyssey 2016, Bilbao, Spain 7

slide-8
SLIDE 8

Posterior inference with informative prior

  • Conventional i-vector extraction assumes a standard Gaussian prior on the

latent variable x

  • Consider a more general case where the prior on x has mean 𝛎p and

covariance 𝚻p

  • I-vector extraction with informative prior

Odyssey 2016, Bilbao, Spain 8

 

, x 0 I

 

p p

, x μ Σ

slide-9
SLIDE 9

Subspace orthonormalizing prior

  • In this work, we consider the following informative prior
  • I-vector extraction

Odyssey 2016, Bilbao, Spain 9

   

1 T p p

with ,

 x Σ Σ T T

1 T

  L T F

1 1 T T  

      L T T T NT

   

 

   

1 T T T 1 1 T T T T 1 1 1 T T T T

     

                     T T T NT T F T T I T T T NT T F I T T T NT T T T F

slide-10
SLIDE 10

Subspace orthonormalizing prior (cont’d)

  • Using the matrix inversion identity
  • I-vector extraction with subspace orthonormalizing prior:

Odyssey 2016, Bilbao, Spain 10

   

1 1 1 T T T T

  

       I T T T NT T T T F P Q P Projection matrix with

  • rthonormal columns

 

1 T T T 1 1 

 T T T T U U

   

1 1 1 T T T T

  

      T T T I NT T T T F

slide-11
SLIDE 11

RAPID COMPUTATION

I-vector extraction with

Odyssey 2016, Bilbao, Spain 11

slide-12
SLIDE 12

Solving the matrix inversion

  • Singular value decomposition of T
  • It follows that U1 spans the same subspace as T, and U1 ⊥ U2
  • Using the above, we solve for the following matric inversion

Odyssey 2016, Bilbao, Spain 12

 

1 2

,   T USV U U SV

     

1 1 1 T T T 1 1 1 T 2 2 1 T 2 2     

                          I N T T T T I N U U I N I U U I N NU U

slide-13
SLIDE 13

Solving the matrix inversion (cont’d)

  • Let A = (I + N)
  • Using the matrix inversion lemma
  • Using again the matrix inversion identity

Odyssey 2016, Bilbao, Spain 13

   

1 1 T 1 1 T 1 T 1 2 2 2 2 2 2      

    A NU U A A N I U U A N U U A

     

1 1 1 1 T T T T 2 2 2 2    

             I N T T T T I N NU U A NU U

   

1 1 1 T T 1 1 T 1 T 1 2 2 2 2       

           I N T T T T A A NU U I A NU U A

slide-14
SLIDE 14

Rapid computation of i-vector

  • Uniform occupancy assumption:
  • Or equivalently
  • The matrix inversion can be simplified as
  • Since T ⊥ U2 , the second term diminishes

Odyssey 2016, Bilbao, Spain 14

 

1 1

for 0 1  

 

     A N I N N I 1

c c

N c N    

   

1 1 1 T T 1 T 1 T 1 2 2 2 2

     

            I N T T T T A U U I A NU U A

     

 

1 1 1 1 1 T T T T T T

    

        T T T I NT T T T F T T T I N F

slide-15
SLIDE 15

Computational complexity and memory cost

Odyssey 2016, Bilbao, Spain 15

Complexity Memory cost Time ratio Baseline (slow) O(CFM2 + M3) O(CFM) 106.44 Baseline (fast) O(CFM + CM2 + M3) O(CFM + CM2) 11.99 Proposed (exact) O(CFM + CM2 + M3) O(CFM + CM2) 12.65 Proposed (fast) O(CFM) O(CFM) 1

Baseline (slow) Baseline (fast) Proposed (exact)

 

1 T T c c c c N

  

I T T T F

 

1 T c c c N

  

I A T F

 

 

1 T T

1

c c c c N

  

T T T F

 

 

1 1 T T

 

  T T T I N F

Proposed (fast)

slide-16
SLIDE 16

Posterior covariance

  • Determined by the zero-order statistics and T matrix.
  • Might be desired for uncertainty propagation.
  • Using the same derivation procedure as for the posterior mean.
  • The computational complexity is O(CM2), assuming that we pre-computed

the matrices .

Odyssey 2016, Bilbao, Spain 16

T c c

T T

slide-17
SLIDE 17

Using informative prior in EM update of T

Odyssey 2016, Bilbao, Spain 17

slide-18
SLIDE 18

EXPERIMENT

Rapid computation of i-vector

Odyssey 2016, Bilbao, Spain 18

slide-19
SLIDE 19

Experimental setup

  • NIST SRE’10 extended core task, CCs 1 to 9
  • UBM

– Gender dependent with C = 512 mixtures – 57-dim MFCC – SWB, SRE’04, 05, 06

  • T matrix

– M = 400 – Trained using the same dataset as UBM

  • PLDA

– LDA to 300-dim and length normalization was performed – 200 speaker factors – Full residual covariance for channel modeling

Odyssey 2016, Bilbao, Spain 19

slide-20
SLIDE 20

SRE’10 core-extended (female)

  • Introducing an informative prior does not seem to degrade the

performance.

  • For the tel-tel CC 5, the relative degradation is 10.04% in EER and 4.54% in

min DCF.

  • For all the 9 CCs, relative degradation ranges from 10.04% to 16.11% in EER

and 0.67% to 20.40% in min DCF.

Odyssey 2016, Bilbao, Spain 20

EER MinDCF10

slide-21
SLIDE 21

SRE’10 core-extended (female)

  • We trained the T matrix assuming a subspace-orthonormalizing prior.
  • Comparing to the results using T trained with standard Gaussian prior, a

slightly better results could be observed.

Odyssey 2016, Bilbao, Spain 21

EER MinDCF10

slide-22
SLIDE 22

Conclusion

  • We introduced the following for Rapid Computation of I-vector

– Subspace orthonomalizing prior – Uniform occupancy assumption

  • The computational speed-up is attained by avoiding the need to compute

the posterior covariance in order to compute the posterior mean.

  • The proposed method speed up the i-vector extraction by a factor of 12

compared to fast baseline (and a factor of 106 compared to the slow baseline) with marginal degradation in recognition accuracy.

Odyssey 2016, Bilbao, Spain 22

slide-23
SLIDE 23

QUESTION?

Thanks for Your Attention!

Odyssey 2016, Bilbao, Spain 23