Speaker line-up calibration of the i-vector based speaker - - PowerPoint PPT Presentation

speaker line up calibration of the i vector based speaker
SMART_READER_LITE
LIVE PREVIEW

Speaker line-up calibration of the i-vector based speaker - - PowerPoint PPT Presentation

1 Centre for Language and Speech Technology Radboud University Nijmegen The Netherlands Speaker line-up calibration of the i-vector based speaker recognition system for forensic application M. I. Mandasari, D. van Leeuwen and M. McLaren The


slide-1
SLIDE 1

Speaker line-up calibration of the i-vector based speaker recognition system for forensic application

The International Association of Forensic Phonetics and Acoustics 2011 Annual Conference 24-28 July, 2011; Vienna, Austria

  • M. I. Mandasari, D. van Leeuwen

and M. McLaren

Centre for Language and Speech Technology Radboud University Nijmegen The Netherlands

1

slide-2
SLIDE 2

Outline

  • Why Likelihood Ratio (LR) calibration?
  • LR calibration methods

▫ Linear calibration ▫ Line-up calibration (2011)

  • I-vector based automatic speaker recognition

system for forensic application

  • Experiment and results

2

slide-3
SLIDE 3

Likelihood Ratio (LR)

  • In forensic evidence reporting

▫ Scores – LR representation ▫ Used for posterior odds computing by the fact finder

(Prior odds) (Posterior odds) Trace Prosecution hypothesis Defense hypothesis 3

slide-4
SLIDE 4

Why is LR calibration important?

A study from Rodriguez et. al. (2007): “LR calculated from the un-calibrated system was often misleading, while the calibrated system produced more reliable LR”

Automatic Speaker Recognition System CALIBRATION LR Well-Calibrated System Good for Forensics 4

slide-5
SLIDE 5

LR calibration method

  • 2007 [ref. 7]

Linear Calibration

  • 2011 [ref. 6]

Line-up Calibration

5

slide-6
SLIDE 6

Linear calibration

  • Scores  Linear transformation  LR
  • Calibration:

▫ Optimize the linear transformation ▫ Using a set of development scores ▫ to minimize …

The Cllr provides an estimation of calibration error over all priors.

  • Miscalibration cost:

▫ Low miscalibration cost indicates that the system produces more reliable LRs.

6

slide-7
SLIDE 7

Line-up LR calibration method

  • Motivated by the witness line-up scenario in

forensic tasks.

Suspect Witness Foils Foils 7

slide-8
SLIDE 8

Line-up LR calibration method

Each speaker scores is “lined- up” with all foils speakers Determining the rank within the line-up set Computing the calibrated LR value!

8

slide-9
SLIDE 9

I-vector based speaker recognition

i -vector is a speech representation in a low-dimensional total variability space. [Dehak, et. al, 2009]

Total Variability space (400D) Linear Discriminant Analysis (LDA) Projection (200D) Within Class Covariance Normalization (WCCN) Cosine Kernel Scoring Speech A Speech B

B A B A

w w w w . .

A

w

B

w Scores LRs LR calibration i-vector 9

slide-10
SLIDE 10

I-vector system for forensics [ref. 4]

  • The i-vector speaker recognition system …

▫ has a good performance in classification & calibration, and ▫ offer a good separation of target and non- target scores

  • The symmetrical behavior of the i-vector system

is of particular interest in forensic evidence reporting, where long speech samples can be collected from a suspected speaker in an interview scenario while the trace may be of uncontrolled duration.

10

slide-11
SLIDE 11

i-vector classification performance

11

Symmetrical!

slide-12
SLIDE 12

Experiment setup

  • i-vector based automatic speaker recognition
  • Dataset:

▫ NIST SRE 2010 (Halved into two datasets with disjoint speakers) ▫ For duration = 5, 10, 20, 40 sec. and full utterances

  • Linear vs. Line-up calibration method
  • Performance parameter

▫ Classification : EER (Equal Error Rate) ▫ Calibration : Mis-calibration

12

slide-13
SLIDE 13

Classification Performance

Female Male 13

slide-14
SLIDE 14

Classification Performance

Male Female 14

slide-15
SLIDE 15

Classification Performance

  • Still offer symmetrical behavior in Line-up

calibration,

  • EER in line-up calibration is generally better

than in linear calibration, and

  • The EER improvement is greater in short

duration cases.

  • To conclude…

▫ Line-up calibration gives a better classification performance in general than linear calibration method.

15

slide-16
SLIDE 16

Calibration Performance

Female Male 16

slide-17
SLIDE 17

Calibration Performance

Female Male 17

slide-18
SLIDE 18

Calibration Performance

  • In both male and female case, the miscalibration

parameter of the linear calibration method is generally better than the line-up calibration method, however

  • The difference of the calibration performance,

measured by Cllr is small – (not more than 0.01)

  • To conclude

▫ Calibration performance within the line-up calibration method is not better than the linear method, but it is not that bad either.

18

slide-19
SLIDE 19

Our Findings

▫ EER with line-up calibration is better, somehow it shows that this calibration method act more like score normalization* in the system.

Performance Gender Linear vs. Line-up calibration Classification Male .3822 (EER, %) Female .3496 Calibration Male .0052 (Miscalibration) Female .0104

19

slide-20
SLIDE 20

Reference

1. Butcher, A.R. (2002). Forensic Phonetics: Issues in speaker identification evidence. Proceedings of the Inaugural International Conference of the Institute of Forensic Studies, Italy, p.3-5. 2. Brümmer, N. (2006). Focal II: Toolkit for calibration of multi-class recognition scores, software available at http://www.dsp.sun.ac.za/~nbrummer/focal/index.htm. 3. Dehak, N., Dehak, R., Glass, J., Reynolds, D. and Kenny, P. (2010). Cosine similarity scoring without score normalization techniques. Proceeding of Odyssey. 4. Mandasari, M. I., McLaren, M. and van Leeuwen, D. (2011). Evaluation of i-vector Speaker Recognition Systems for Forensic Application. Submitted to the 12th Annual Conference of the International Speech Communication Association, Florence, Italy. 5. Rodriguez J. G. and Ramos, D. (2007). Forensic automatic speaker classification in the “coming paradigm shift”. Speaker Classification p. 205-217. Springer. 6. van Leeuwen, D. and Brümmer, N. (2011). A speaker line-up for the likelihood ratio. Submitted to the 12th Annual Conference of the International Speech Communication Association, Florence, Italy. 7. van Leeuwen, D. and Brümmer, N. (2007). An introduction to application- independent evaluation of speaker recognition systems. Speaker Classification p. 330-353. Springer.

20

slide-21
SLIDE 21

Vienna, 25 July 2011

21