[PPT] - The distribution of calibrated likelihood-ratios in speaker PowerPoint Presentation

SLIDE 1

Intro Calibration Gaussian scores Applications The End

The distribution of calibrated likelihood-ratios in speaker recognition

David van Leeuwen and Niko Br¨ ummer d.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University Nijmegen, Agnitio Research 15 October 20131

1First published at Interspeech 2013

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 1 / 20

SLIDE 2

Intro Calibration Gaussian scores Applications The End

Inspiration for this work

We had these

badly-behaving scores2 depending on utterance duration

We tried to design

universal calibration transformations

Question arose:

where do calibrated scores hang out?

What is their

distribution?

20 40 60 80 20 40 60 80 0.1 0.2 0.3 0.4 0.5 0.6 Test duration (sec) Train duration (sec) Score distribution mean Target Non−target

(a) Cosine Kernel

20 40 60 80 20 40 60 80 10 20 30 40 50 60 Test duration (sec) Train duration (sec) Score distribution mean Target Non−target

(b) Normalized Cosine Kernel

2Mandasari et al., Interspeech 2011

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 2 / 20

SLIDE 3

Intro Calibration Gaussian scores Applications The End

What is calibration?

Traditionally:

The capability to set a threshold correctly

Nowadays:

The ability to give a proper probabilistic statement about

identity

. . . to produce (log) likelihood ratio scores for every comparison
. . . that lead to optimal Bayes’ decisions

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 3 / 20

SLIDE 4

Intro Calibration Gaussian scores Applications The End

What is calibration?

Traditionally:

The capability to set a threshold correctly

Nowadays:

The ability to give a proper probabilistic statement about

identity

. . . to produce (log) likelihood ratio scores for every comparison
. . . that lead to optimal Bayes’ decisions

Bayes’ decision

Priors + likelihoods → posteriors Posteriors + costs → expected costs Minimize expected costs → decision

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 3 / 20

SLIDE 5

Intro Calibration Gaussian scores Applications The End

The forensic motivation of the Likelihood Ratio

Use the log Likelihood Ratio as weight of evidence in court

Using Bayes’s rule, separate contributions
Forensic Expert, w.r.t. the material they know about
The other evidence / circumstances of the case

to compute the posterior probability that suspect is the perpetrator

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 4 / 20

SLIDE 6

Intro Calibration Gaussian scores Applications The End

The forensic motivation of the Likelihood Ratio

Use the log Likelihood Ratio as weight of evidence in court

Using Bayes’s rule, separate contributions
Forensic Expert, w.r.t. the material they know about E
The other evidence / circumstances of the case I

to compute the posterior probability that suspect is the perpetrator Hp = ¬Hd

Mathematically,

P(Hp | E, I) P(Hd | E, I)

judge/jury wants to know

= P(E | Hp, I) P(E | Hd, I)

given by expert

× P(Hp, I) P(Hd, I)

ther evidence

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 4 / 20

SLIDE 7

Intro Calibration Gaussian scores Applications The End

From scores to likelihood ratios

A likelihood ratio can be treated like a score
All analysis tricks work: ROC, DET, EER, decision cost
functions. . .
But can we transform a score into a LR?
This is a process known as calibration: giving meaning to

probabilistic statements

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 5 / 20

SLIDE 8

Intro Calibration Gaussian scores Applications The End

From scores to likelihood ratios

A likelihood ratio can be treated like a score
All analysis tricks work: ROC, DET, EER, decision cost
functions. . .
But can we transform a score into a LR?
This is a process known as calibration: giving meaning to

probabilistic statements

problem statement

But what is the definition of calibrated scores / LRs?

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 5 / 20

SLIDE 9

Intro Calibration Gaussian scores Applications The End

Definition of Calibrated Likelihood Ratios

Our definition3 The LR of the LR is the LR

r, for the mathematically inclined

LR = P(LR | Hp) P(LR | Hd)

3Proof in paper, short version in Mandasari et al., IEEE-TASLP (2013, accepted)

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 6 / 20

SLIDE 10

Intro Calibration Gaussian scores Applications The End

Definition of Calibrated Likelihood Ratios

Our definition3 The LR of the LR is the LR

r, for the mathematically inclined

LR = P(LR | Hp) P(LR | Hd) which happens to be equivalent to log LR = log P(log LR | Hp) P(log LR | Hd) The LLR of the LLR is the LLR

3Proof in paper, short version in Mandasari et al., IEEE-TASLP (2013, accepted)

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 6 / 20

SLIDE 11

Intro Calibration Gaussian scores Applications The End

More inspiration: Why are DET curves straight?

If score distributions

are Gaussian, then DET curve is straight

Slope is ratio of

standard- deviations of the score distributions

If DET is straight,

score distributions are not necessarily Gaussian

but can be made

Gaussian by warping of score axis

false alarm probability (%) miss probability (%) 0.1 0.5 1 2 5 10 20 40 0.1 0.5 1 2 5 10 20 40 EER DCF operating point minimum DCF operating point d’= 1 d’= 4 d’= 5 d’= 6 David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 7 / 20

SLIDE 12

Intro Calibration Gaussian scores Applications The End

For reference: these are the score distributions

Clearly not Gaussian
But still leading to a

straight DET curve

non-targets: d(x)

(different)

targets: e(x)

(equal)

−10 10 20 0.00 0.05 0.10 0.15

Probability density

score Density Non−targets Targets probability of false alarm miss threshold

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 8 / 20

SLIDE 13

Intro Calibration Gaussian scores Applications The End

Can Gaussian Scores be Well Calibrated?

Let’s try

Gaussian non-targets d(x) = N(x | µd, σ2

d)

calibration definition for LLR:

x = log e(x) d(x) targets e(x) = exd(x) Now use the expression for the normal distribution N, and see what the targets e(x) look like e(x) = exd(x) = 1 √ 2πσd ex−(x−µd)2/2σ2

d David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 9 / 20

SLIDE 14

Intro Calibration Gaussian scores Applications The End

Math 101

Expanding the exponent for target distribution e(x): − x2 − 2µdx + µ2

d

2σ2

d

+ 2σ2

dx

2σ2

d

= − x2 − 2(µd + σ2

d)x + µ2 d

2σ2

d

= −

x − (µd + σ2

d)

2 2σ2

d

Gaussian form

+ 2µdσ2

d + σ4 d

2σ2

d

Normalisation constant

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 10 / 20

SLIDE 15

Intro Calibration Gaussian scores Applications The End

Math 101

Expanding the exponent for target distribution e(x): − x2 − 2µdx + µ2

d

2σ2

d

+ 2σ2

dx

2σ2

d

= − x2 − 2(µd + σ2

d)x + µ2 d

2σ2

d

= −

x − (µd + σ2

d)

2 2σ2

d

Gaussian form

+ 2µdσ2

d + σ4 d

2σ2

d

Normalisation constant

Gaussian form

if µe = µd + σ2

d

with σe = σd
normalization requires −2µd = σ2

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 10 / 20

SLIDE 16

Intro Calibration Gaussian scores Applications The End

Conclusions of this little exercise

Consider non-target distribution d(x) and target score

distribution e(x)

Then if d(x) is normally distributed

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 11 / 20

SLIDE 17

Intro Calibration Gaussian scores Applications The End

Conclusions of this little exercise

Consider non-target distribution d(x) and target score

distribution e(x)

Then if d(x) is normally distributed

. . . the calibration definition tells us

e(x) is normally distributed as well
Variances are the same for d(x) and e(x)
The means are symmetric around 0,

µd = −µe

Variance and mean are related

σ2 = 2µ

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 11 / 20

SLIDE 18

Intro Calibration Gaussian scores Applications The End

Example of well-calibrated scores

LR = 2

density scores around 2 is 2× as high for targets (red) as for the non-targets (blue)

−10 −5 5 10 0.00 0.05 0.10 0.15 log LR density log 2 David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 12 / 20

SLIDE 19

Intro Calibration Gaussian scores Applications The End

Example of well-calibrated scores

LR = 2

density scores around 2 is 2× as high for targets (red) as for the non-targets (blue)

LR = 4

−10 −5 5 10 0.00 0.05 0.10 0.15 log LR density log 4 David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 12 / 20

SLIDE 20

Intro Calibration Gaussian scores Applications The End

Example of well-calibrated scores

LR = 2

density scores around 2 is 2× as high for targets (red) as for the non-targets (blue)

LR = 10

−10 −5 5 10 0.00 0.05 0.10 0.15 log LR density log 10 David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 12 / 20

SLIDE 21

Intro Calibration Gaussian scores Applications The End

Some direct consequences

Well calibrated

straight DET curves must be of 45◦ slope

Preferred “flat”

straight DET curves can’t arise from calibrated scores

highly-

discriminative systems have flat DET curves,

fingerprint, iris,

. . .

1e-06 1e-04 1e-02 1e+00 0.0 0.2 0.4 0.6 0.8 1.0

ROC at EER = 1 %

False Match Rate False Non Match Rate sigma ratio 0.5 1.0 2.0

5

5 10 0.0 0.1 0.2 0.3 0.4 Score Distributions at EER = 10 % and sigma ratio = 2 score density

@ FAR=10–6 @ FAR=10–3

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 13 / 20

SLIDE 22

Intro Calibration Gaussian scores Applications The End

All relations are known, now

From this model of scores all other characteristics follow, e.g.,

Equal Error Rate E=
Threshold at 0
Integrate the miss error:

E= =

−∞

N(x | σ, µ) dx = Φ(−µ/σ) = Φ(−

µ/2)
Φ(z) cumulative normal distribution
Cost of LLR Cllr

Cllr = 1 log 2 ∞

−∞

N(x | µ, σ) log(1 + e−x) dx

Cllr depends only on E=

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 14 / 20

SLIDE 23

Intro Calibration Gaussian scores Applications The End

Cllr depends only on E=

Approximate relation: Cllr ≈ 1 − (2E= − 1)2

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.2 0.4 0.6 0.8 1.0

Calibrated Gaussian LLR distributions

eer Cllr David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 15 / 20

SLIDE 24

Intro Calibration Gaussian scores Applications The End

Application: a new way of doing calibration

Calibration is the process of fixing scores so that they can be interpreted better as log likelihood ratios

Traditionally, this is done in speaker recognition by an affine

transformation of score s x = as + b

parameters a and b found by logistic regression using a

development set of trials

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 16 / 20

SLIDE 25

Intro Calibration Gaussian scores Applications The End

Application: a new way of doing calibration

Calibration is the process of fixing scores so that they can be interpreted better as log likelihood ratios

Traditionally, this is done in speaker recognition by an affine

transformation of score s x = as + b

parameters a and b found by logistic regression using a

development set of trials

New calibration method:

Find a and b by constraining the transformed scores to satisfy the Gaussian LLR conditions for µ and σ

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 16 / 20

SLIDE 26

Intro Calibration Gaussian scores Applications The End

Math 101 again

Raw score means and variances md,e, s2

d,e.

Transformed target mean: ame + b = µ
Transformed non-target mean amd + b = −µ
Weighted variance v = (1 − α)s2

d + αs2 e

Transformed variance σ2 = a2v = 2µ

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 17 / 20

SLIDE 27

Intro Calibration Gaussian scores Applications The End

Math 101 again

Raw score means and variances md,e, s2

d,e.

Transformed target mean: ame + b = µ
Transformed non-target mean amd + b = −µ
Weighted variance v = (1 − α)s2

d + αs2 e

Transformed variance σ2 = a2v = 2µ

. . . results in solution

a = me − md

v

b = −ame + md

2

This is a closed-form solution!

Constrained Maximum Likelihood Gaussian: CMLG

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 17 / 20

SLIDE 28

Intro Calibration Gaussian scores Applications The End

First calibration experiment: Miranti’s scores

RUN i-vector PLDA

system

calibrate on

SRE-2008, evaluate using Cllr on SRE-2010

25 different duration-

combinations, to sample range of performances

Two linear calibration

methods

y Logistic regression x This method (CMLG)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Correlation in Cllr for different calibration methods

constrained maximum likelihood Gaussian (CMLG) logistic regression

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 18 / 20

SLIDE 29

Intro Calibration Gaussian scores Applications The End

Second experiment: Niko’s scores

Agnitio Research’s

SRE-2012 system and scores

Calibrated using their

dev-set

Evaluated using

Cprimary

official SRE-2012

metric

sensitive to low-FA

range

Contrasting
Niko + GD

Interspeech 2013

This method

CMLG

8
6
4
2

2 0.30 0.35 0.40 0.45 0.50 0.55 0.60

Comparison of calibration methods

log(α) − log(1 − α) Cprimary logistic regression CMLG

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 19 / 20

SLIDE 30

Intro Calibration Gaussian scores Applications The End

Conclusions

We can prove that “the LLR of the LLR is the LLR”
. . . already in exam questions course Forensic Linguistics. . .
Well calibrated Gaussian non-target scores imply
Gaussian target scores
with same variance
and opposite mean
and a variance that is equal to the difference in means
We can use it to find calibration parameters
as a closed-form solution
that gives same performance as logistic regression, for
two different systems
two different evaluation data bases
two different calibration-sensitive evaluation metrics

David van Leeuwen and Niko Br¨ ummerd.vanleeuwen@let.ru.nl, nbrummer@agnito.es Netherlands Forensic Institute / Radboud University BTFS 2013 The distribution of calibrated likelihood-ratios in speaker recognition 20 / 20