The distribution of calibrated likelihood-ratios in speaker - - PowerPoint PPT Presentation

the distribution of calibrated likelihood ratios in
SMART_READER_LITE
LIVE PREVIEW

The distribution of calibrated likelihood-ratios in speaker - - PowerPoint PPT Presentation

Intro Calibration Gaussian scores Applications The End The distribution of calibrated likelihood-ratios in speaker recognition David van Leeuwen and Niko Br ummer d.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute


slide-1
SLIDE 1

Intro Calibration Gaussian scores Applications The End

The distribution of calibrated likelihood-ratios in speaker recognition

David van Leeuwen and Niko Br¨ ummer d.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University Nijmegen, Agnitio Research 15 October 20131

1First published at Interspeech 2013

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 1 / 20

slide-2
SLIDE 2

Intro Calibration Gaussian scores Applications The End

Inspiration for this work

  • We had these

badly-behaving scores2 depending on utterance duration

  • We tried to design

universal calibration transformations

  • Question arose:

where do calibrated scores hang out?

  • What is their

distribution?

20 40 60 80 20 40 60 80 0.1 0.2 0.3 0.4 0.5 0.6 Test duration (sec) Train duration (sec) Score distribution mean Target Non−target

(a) Cosine Kernel

20 40 60 80 20 40 60 80 10 20 30 40 50 60 Test duration (sec) Train duration (sec) Score distribution mean Target Non−target

(b) Normalized Cosine Kernel

2Mandasari et al., Interspeech 2011

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 2 / 20

slide-3
SLIDE 3

Intro Calibration Gaussian scores Applications The End

What is calibration?

Traditionally:

  • The capability to set a threshold correctly

Nowadays:

  • The ability to give a proper probabilistic statement about

identity

  • . . . to produce (log) likelihood ratio scores for every comparison
  • . . . that lead to optimal Bayes’ decisions

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 3 / 20

slide-4
SLIDE 4

Intro Calibration Gaussian scores Applications The End

What is calibration?

Traditionally:

  • The capability to set a threshold correctly

Nowadays:

  • The ability to give a proper probabilistic statement about

identity

  • . . . to produce (log) likelihood ratio scores for every comparison
  • . . . that lead to optimal Bayes’ decisions

Bayes’ decision

Priors + likelihoods → posteriors Posteriors + costs → expected costs Minimize expected costs → decision

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 3 / 20

slide-5
SLIDE 5

Intro Calibration Gaussian scores Applications The End

The forensic motivation of the Likelihood Ratio

Use the log Likelihood Ratio as weight of evidence in court

  • Using Bayes’s rule, separate contributions
  • Forensic Expert, w.r.t. the material they know about
  • The other evidence / circumstances of the case

to compute the posterior probability that suspect is the perpetrator

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 4 / 20

slide-6
SLIDE 6

Intro Calibration Gaussian scores Applications The End

The forensic motivation of the Likelihood Ratio

Use the log Likelihood Ratio as weight of evidence in court

  • Using Bayes’s rule, separate contributions
  • Forensic Expert, w.r.t. the material they know about E
  • The other evidence / circumstances of the case I

to compute the posterior probability that suspect is the perpetrator Hp = ¬Hd

  • Mathematically,

P(Hp | E, I) P(Hd | E, I)

  • judge/jury wants to know

= P(E | Hp, I) P(E | Hd, I)

  • given by expert

× P(Hp, I) P(Hd, I)

  • ther evidence

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 4 / 20

slide-7
SLIDE 7

Intro Calibration Gaussian scores Applications The End

From scores to likelihood ratios

  • A likelihood ratio can be treated like a score
  • All analysis tricks work: ROC, DET, EER, decision cost
  • functions. . .
  • But can we transform a score into a LR?
  • This is a process known as calibration: giving meaning to

probabilistic statements

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 5 / 20

slide-8
SLIDE 8

Intro Calibration Gaussian scores Applications The End

From scores to likelihood ratios

  • A likelihood ratio can be treated like a score
  • All analysis tricks work: ROC, DET, EER, decision cost
  • functions. . .
  • But can we transform a score into a LR?
  • This is a process known as calibration: giving meaning to

probabilistic statements

problem statement

But what is the definition of calibrated scores / LRs?

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 5 / 20

slide-9
SLIDE 9

Intro Calibration Gaussian scores Applications The End

Definition of Calibrated Likelihood Ratios

Our definition3 The LR of the LR is the LR

  • r, for the mathematically inclined

LR = P(LR | Hp) P(LR | Hd)

3Proof in paper, short version in Mandasari et al., IEEE-TASLP (2013, accepted)

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 6 / 20

slide-10
SLIDE 10

Intro Calibration Gaussian scores Applications The End

Definition of Calibrated Likelihood Ratios

Our definition3 The LR of the LR is the LR

  • r, for the mathematically inclined

LR = P(LR | Hp) P(LR | Hd) which happens to be equivalent to log LR = log P(log LR | Hp) P(log LR | Hd) The LLR of the LLR is the LLR

3Proof in paper, short version in Mandasari et al., IEEE-TASLP (2013, accepted)

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 6 / 20

slide-11
SLIDE 11

Intro Calibration Gaussian scores Applications The End

More inspiration: Why are DET curves straight?

  • If score distributions

are Gaussian, then DET curve is straight

  • Slope is ratio of

standard- deviations of the score distributions

  • If DET is straight,

score distributions are not necessarily Gaussian

  • but can be made

Gaussian by warping of score axis

false alarm probability (%) miss probability (%) 0.1 0.5 1 2 5 10 20 40 0.1 0.5 1 2 5 10 20 40 EER DCF operating point minimum DCF operating point d’= 1 d’= 4 d’= 5 d’= 6 David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 7 / 20

slide-12
SLIDE 12

Intro Calibration Gaussian scores Applications The End

For reference: these are the score distributions

  • Clearly not Gaussian
  • But still leading to a

straight DET curve

  • non-targets: d(x)

(different)

  • targets: e(x)

(equal)

−10 10 20 0.00 0.05 0.10 0.15

Probability density

score Density Non−targets Targets probability of false alarm miss threshold

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 8 / 20

slide-13
SLIDE 13

Intro Calibration Gaussian scores Applications The End

Can Gaussian Scores be Well Calibrated?

Let’s try

  • Gaussian non-targets d(x) = N(x | µd, σ2

d)

  • calibration definition for LLR:

x = log e(x) d(x) targets e(x) = exd(x) Now use the expression for the normal distribution N, and see what the targets e(x) look like e(x) = exd(x) = 1 √ 2πσd ex−(x−µd)2/2σ2

d David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 9 / 20

slide-14
SLIDE 14

Intro Calibration Gaussian scores Applications The End

Math 101

Expanding the exponent for target distribution e(x): − x2 − 2µdx + µ2

d

2σ2

d

+ 2σ2

dx

2σ2

d

= − x2 − 2(µd + σ2

d)x + µ2 d

2σ2

d

= −

  • x − (µd + σ2

d)

2 2σ2

d

  • Gaussian form

+ 2µdσ2

d + σ4 d

2σ2

d

  • Normalisation constant

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 10 / 20

slide-15
SLIDE 15

Intro Calibration Gaussian scores Applications The End

Math 101

Expanding the exponent for target distribution e(x): − x2 − 2µdx + µ2

d

2σ2

d

+ 2σ2

dx

2σ2

d

= − x2 − 2(µd + σ2

d)x + µ2 d

2σ2

d

= −

  • x − (µd + σ2

d)

2 2σ2

d

  • Gaussian form

+ 2µdσ2

d + σ4 d

2σ2

d

  • Normalisation constant

Gaussian form

  • if µe = µd + σ2

d

  • with σe = σd
  • normalization requires −2µd = σ2

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 10 / 20

slide-16
SLIDE 16

Intro Calibration Gaussian scores Applications The End

Conclusions of this little exercise

  • Consider non-target distribution d(x) and target score

distribution e(x)

  • Then if d(x) is normally distributed

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 11 / 20

slide-17
SLIDE 17

Intro Calibration Gaussian scores Applications The End

Conclusions of this little exercise

  • Consider non-target distribution d(x) and target score

distribution e(x)

  • Then if d(x) is normally distributed

. . . the calibration definition tells us

  • e(x) is normally distributed as well
  • Variances are the same for d(x) and e(x)
  • The means are symmetric around 0,

µd = −µe

  • Variance and mean are related

σ2 = 2µ

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 11 / 20

slide-18
SLIDE 18

Intro Calibration Gaussian scores Applications The End

Example of well-calibrated scores

  • LR = 2

density scores around 2 is 2× as high for targets (red) as for the non-targets (blue)

−10 −5 5 10 0.00 0.05 0.10 0.15 log LR density log 2 David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 12 / 20

slide-19
SLIDE 19

Intro Calibration Gaussian scores Applications The End

Example of well-calibrated scores

  • LR = 2

density scores around 2 is 2× as high for targets (red) as for the non-targets (blue)

  • LR = 4

−10 −5 5 10 0.00 0.05 0.10 0.15 log LR density log 4 David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 12 / 20

slide-20
SLIDE 20

Intro Calibration Gaussian scores Applications The End

Example of well-calibrated scores

  • LR = 2

density scores around 2 is 2× as high for targets (red) as for the non-targets (blue)

  • LR = 10

−10 −5 5 10 0.00 0.05 0.10 0.15 log LR density log 10 David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 12 / 20

slide-21
SLIDE 21

Intro Calibration Gaussian scores Applications The End

Some direct consequences

  • Well calibrated

straight DET curves must be of 45◦ slope

  • Preferred “flat”

straight DET curves can’t arise from calibrated scores

  • highly-

discriminative systems have flat DET curves,

  • fingerprint, iris,

. . .

1e-06 1e-04 1e-02 1e+00 0.0 0.2 0.4 0.6 0.8 1.0

ROC at EER = 1 %

False Match Rate False Non Match Rate sigma ratio 0.5 1.0 2.0

  • 5
5 10 0.0 0.1 0.2 0.3 0.4 Score Distributions at EER = 10 % and sigma ratio = 2 score density

@ FAR=10–6 @ FAR=10–3

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 13 / 20

slide-22
SLIDE 22

Intro Calibration Gaussian scores Applications The End

All relations are known, now

From this model of scores all other characteristics follow, e.g.,

  • Equal Error Rate E=
  • Threshold at 0
  • Integrate the miss error:

E= =

−∞

N(x | σ, µ) dx = Φ(−µ/σ) = Φ(−

  • µ/2)
  • Φ(z) cumulative normal distribution
  • Cost of LLR Cllr

Cllr = 1 log 2 ∞

−∞

N(x | µ, σ) log(1 + e−x) dx

  • Cllr depends only on E=

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 14 / 20

slide-23
SLIDE 23

Intro Calibration Gaussian scores Applications The End

Cllr depends only on E=

Approximate relation: Cllr ≈ 1 − (2E= − 1)2

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 1.0

Calibrated Gaussian LLR distributions

eer Cllr David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 15 / 20

slide-24
SLIDE 24

Intro Calibration Gaussian scores Applications The End

Application: a new way of doing calibration

Calibration is the process of fixing scores so that they can be interpreted better as log likelihood ratios

  • Traditionally, this is done in speaker recognition by an affine

transformation of score s x = as + b

  • parameters a and b found by logistic regression using a

development set of trials

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 16 / 20

slide-25
SLIDE 25

Intro Calibration Gaussian scores Applications The End

Application: a new way of doing calibration

Calibration is the process of fixing scores so that they can be interpreted better as log likelihood ratios

  • Traditionally, this is done in speaker recognition by an affine

transformation of score s x = as + b

  • parameters a and b found by logistic regression using a

development set of trials

New calibration method:

Find a and b by constraining the transformed scores to satisfy the Gaussian LLR conditions for µ and σ

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 16 / 20

slide-26
SLIDE 26

Intro Calibration Gaussian scores Applications The End

Math 101 again

Raw score means and variances md,e, s2

d,e.

  • Transformed target mean: ame + b = µ
  • Transformed non-target mean amd + b = −µ
  • Weighted variance v = (1 − α)s2

d + αs2 e

  • Transformed variance σ2 = a2v = 2µ

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 17 / 20

slide-27
SLIDE 27

Intro Calibration Gaussian scores Applications The End

Math 101 again

Raw score means and variances md,e, s2

d,e.

  • Transformed target mean: ame + b = µ
  • Transformed non-target mean amd + b = −µ
  • Weighted variance v = (1 − α)s2

d + αs2 e

  • Transformed variance σ2 = a2v = 2µ

. . . results in solution

  • a = me − md

v

  • b = −ame + md

2

  • This is a closed-form solution!

Constrained Maximum Likelihood Gaussian: CMLG

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 17 / 20

slide-28
SLIDE 28

Intro Calibration Gaussian scores Applications The End

First calibration experiment: Miranti’s scores

  • RUN i-vector PLDA

system

  • calibrate on

SRE-2008, evaluate using Cllr on SRE-2010

  • 25 different duration-

combinations, to sample range of performances

  • Two linear calibration

methods

y Logistic regression x This method (CMLG)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Correlation in Cllr for different calibration methods

constrained maximum likelihood Gaussian (CMLG) logistic regression

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 18 / 20

slide-29
SLIDE 29

Intro Calibration Gaussian scores Applications The End

Second experiment: Niko’s scores

  • Agnitio Research’s

SRE-2012 system and scores

  • Calibrated using their

dev-set

  • Evaluated using

Cprimary

  • official SRE-2012

metric

  • sensitive to low-FA

range

  • Contrasting
  • Niko + GD

Interspeech 2013

  • This method

CMLG

  • 8
  • 6
  • 4
  • 2

2 0.30 0.35 0.40 0.45 0.50 0.55 0.60

Comparison of calibration methods

log(α) − log(1 − α) Cprimary logistic regression CMLG

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 19 / 20

slide-30
SLIDE 30

Intro Calibration Gaussian scores Applications The End

Conclusions

  • We can prove that “the LLR of the LLR is the LLR”
  • . . . already in exam questions course Forensic Linguistics. . .
  • Well calibrated Gaussian non-target scores imply
  • Gaussian target scores
  • with same variance
  • and opposite mean
  • and a variance that is equal to the difference in means
  • We can use it to find calibration parameters
  • as a closed-form solution
  • that gives same performance as logistic regression, for
  • two different systems
  • two different evaluation data bases
  • two different calibration-sensitive evaluation metrics

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 20 / 20