Evidence evaluation for discrete data Evidence evaluation for - - PowerPoint PPT Presentation

evidence evaluation for discrete data
SMART_READER_LITE
LIVE PREVIEW

Evidence evaluation for discrete data Evidence evaluation for - - PowerPoint PPT Presentation

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation for discrete data Colin Aitken School of Mathematics and Maxwell


slide-1
SLIDE 1

Evidence evaluation for discrete data

Evidence evaluation for discrete data

Evidence evaluation for discrete data

slide-2
SLIDE 2

Evidence evaluation for discrete data

Evidence evaluation for discrete data Colin Aitken School of Mathematics and Maxwell Institute, The University of Edinburgh

Bayesian Biometrics for Forensics Network (BBFOR2) http://cls.ru.nl/projects/bbfor2 British Academy / Leverhulme Foundation (eGAP2): ‘Modelling features for forensic speaker comparison’. Erica Gold Language and Linguistic Science The University of York. c.g.g.aitken@ed.ac.uk Forensic Science International, 2013, 230, 147-155.

Evidence evaluation for discrete data

slide-3
SLIDE 3

Evidence evaluation for discrete data

Control and recovered evidence

Two situations: control evidence is associated with the crime scene and recovered evidence is associated with a suspect and vice versa.

Evidence evaluation for discrete data

slide-4
SLIDE 4

Evidence evaluation for discrete data

Control and recovered evidence

Two situations: control evidence is associated with the crime scene and recovered evidence is associated with a suspect and vice versa. Control evidence may be found at the scene of a crime and recovered evidence associated with a suspect. For example, in a burglary, glass fragments found below a broken window at a crime scene may be assumed to come from that window. Glass fragments found on a suspect’s clothing may or may not have come from the broken window at the crime scene.

Evidence evaluation for discrete data

slide-5
SLIDE 5

Evidence evaluation for discrete data

Control and recovered evidence

Two situations: control evidence is associated with the crime scene and recovered evidence is associated with a suspect and vice versa. Control evidence may be found at the scene of a crime and recovered evidence associated with a suspect. For example, in a burglary, glass fragments found below a broken window at a crime scene may be assumed to come from that window. Glass fragments found on a suspect’s clothing may or may not have come from the broken window at the crime scene. Control evidence may be found on a suspect and recovered evidence at a crime scene. For example, in an assault, blood which cannot be matched to the victim may be found at the crime scene and could be assumed to come from the criminal, whose identity is unknown. A suspect is identified and a DNA swab taken. The source of this DNA is known and is control evidence.

Evidence evaluation for discrete data

slide-6
SLIDE 6

Evidence evaluation for discrete data

Control and recovered evidence

Two situations: control evidence is associated with the crime scene and recovered evidence is associated with a suspect and vice versa. Control evidence may be found at the scene of a crime and recovered evidence associated with a suspect. For example, in a burglary, glass fragments found below a broken window at a crime scene may be assumed to come from that window. Glass fragments found on a suspect’s clothing may or may not have come from the broken window at the crime scene. Control evidence may be found on a suspect and recovered evidence at a crime scene. For example, in an assault, blood which cannot be matched to the victim may be found at the crime scene and could be assumed to come from the criminal, whose identity is unknown. A suspect is identified and a DNA swab taken. The source of this DNA is known and is control evidence. In forensic phonetics, one scenario would be an audio recording of a telephone message thought to be from the criminal, not identified, and an audio recording taken from a suspect, identified. The telephone message would be the recovered evidence, the audio recording from the suspect would be the control evidence.

Evidence evaluation for discrete data

slide-7
SLIDE 7

Evidence evaluation for discrete data

Evidence evaluation - two-stage approach

Evidence evaluation for discrete data

slide-8
SLIDE 8

Evidence evaluation for discrete data

Evidence evaluation - two-stage approach

Similarity: Assess similarity of control and recovered evidence by some measure, such as a t-test for continuous measurements, or a chi-squared test for discrete measurements.

Evidence evaluation for discrete data

slide-9
SLIDE 9

Evidence evaluation for discrete data

Evidence evaluation - two-stage approach

Similarity: Assess similarity of control and recovered evidence by some measure, such as a t-test for continuous measurements, or a chi-squared test for discrete measurements. If the control and recovered evidence are not similar; the evidence is deemed to have different sources. By not ’similar’ is meant a significant result for a common mean in a t-test or for a common distribution for discrete measurements assessed by a chi-squared test.

Evidence evaluation for discrete data

slide-10
SLIDE 10

Evidence evaluation for discrete data

Evidence evaluation - two-stage approach

Similarity: Assess similarity of control and recovered evidence by some measure, such as a t-test for continuous measurements, or a chi-squared test for discrete measurements. If the control and recovered evidence are not similar; the evidence is deemed to have different sources. By not ’similar’ is meant a significant result for a common mean in a t-test or for a common distribution for discrete measurements assessed by a chi-squared test. If the control and recovered evidence are similar , the evidence is deemed to have a common source. By ’similar’ is meant a non-significant result for a common mean in a t-test or for a common distribution for discrete measurements assessed by a chi-squared test. The second stage is implemented.

Evidence evaluation for discrete data

slide-11
SLIDE 11

Evidence evaluation for discrete data

Evidence evaluation - two-stage approach

Similarity: Assess similarity of control and recovered evidence by some measure, such as a t-test for continuous measurements, or a chi-squared test for discrete measurements. If the control and recovered evidence are not similar; the evidence is deemed to have different sources. By not ’similar’ is meant a significant result for a common mean in a t-test or for a common distribution for discrete measurements assessed by a chi-squared test. If the control and recovered evidence are similar , the evidence is deemed to have a common source. By ’similar’ is meant a non-significant result for a common mean in a t-test or for a common distribution for discrete measurements assessed by a chi-squared test. The second stage is implemented. Rarity: Similarity in measurements which are rare in some sense is taken to be stronger evidence support of a common source then similarity in measurements which are common.

Evidence evaluation for discrete data

slide-12
SLIDE 12

Evidence evaluation for discrete data

Evidence evaluation In forensic statistics, evidence E is evaluated by its effect on the odds in favour of a proposition put forward by the prosecution Hp compared with a proposition put forward by the defence Hd. Thus: Pr(Hp | E) Pr(Hd | E) = Pr(E | Hp) Pr(E | Hd) × Pr(Hp) Pr(Hd). In general, consider E to have two components: one, X, is evidence whose source is known; this is control evidence, the other, Y , is evidence whose source is unknown: this is recovered evidence. The statistic used to evaluate the evidence is the likelihood ratio LR = Pr(E | Hp) Pr(E | Hd) = Pr(X, Y | Hp) Pr(X, Y | Hd).

Evidence evaluation for discrete data

slide-13
SLIDE 13

Evidence evaluation for discrete data

Evidence evaluation - continued

Evidence evaluation for discrete data

slide-14
SLIDE 14

Evidence evaluation for discrete data

Evidence evaluation - continued

Likelihood ratios greater than one support the prosecution

  • proposition. The evidence is more likely if the prosecution’s

proposition is true than if the defence proposition is true.

Evidence evaluation for discrete data

slide-15
SLIDE 15

Evidence evaluation for discrete data

Evidence evaluation - continued

Likelihood ratios greater than one support the prosecution

  • proposition. The evidence is more likely if the prosecution’s

proposition is true than if the defence proposition is true. The posterior odds for one piece of evidence E1 are the prior odds for a second piece of evidence E2: Pr(Hp | E1, E2) Pr(Hd | E1, E2) = Pr(E2 | Hp, E1) Pr(E2 | Hd, E1) × Pr(Hp | E1) Pr(Hd | E1).

Evidence evaluation for discrete data

slide-16
SLIDE 16

Evidence evaluation for discrete data

Evidence evaluation - continued

Likelihood ratios greater than one support the prosecution

  • proposition. The evidence is more likely if the prosecution’s

proposition is true than if the defence proposition is true. The posterior odds for one piece of evidence E1 are the prior odds for a second piece of evidence E2: Pr(Hp | E1, E2) Pr(Hd | E1, E2) = Pr(E2 | Hp, E1) Pr(E2 | Hd, E1) × Pr(Hp | E1) Pr(Hd | E1). With logarithms the updating process becomes additive: log Pr(Hp | E1, E2) Pr(Hd | E1, E2)

  • = log

Pr(E2 | Hp, E1) Pr(E2 | Hd, E1)

  • +log

Pr(Hp | E1) Pr(Hd | E1)

  • .

Evidence evaluation for discrete data

slide-17
SLIDE 17

Evidence evaluation for discrete data

Evidence evaluation - continued

Likelihood ratios greater than one support the prosecution

  • proposition. The evidence is more likely if the prosecution’s

proposition is true than if the defence proposition is true. The posterior odds for one piece of evidence E1 are the prior odds for a second piece of evidence E2: Pr(Hp | E1, E2) Pr(Hd | E1, E2) = Pr(E2 | Hp, E1) Pr(E2 | Hd, E1) × Pr(Hp | E1) Pr(Hd | E1). With logarithms the updating process becomes additive: log Pr(Hp | E1, E2) Pr(Hd | E1, E2)

  • = log

Pr(E2 | Hp, E1) Pr(E2 | Hd, E1)

  • +log

Pr(Hp | E1) Pr(Hd | E1)

  • .

Note, no probabilistic statement is made about the truth of the prosecution or defence propositions.

Evidence evaluation for discrete data

slide-18
SLIDE 18

Evidence evaluation for discrete data

Evidence evaluation - continuous data

Models used are hierarchical random effects models with between-item and within-item variation (e.g., variation in elemental compositions of glass between and within glass objects). Between-item variation is represented by a parameter θ with probability function f (θ). Representing Pr(X, Y ) (a probability) as f (X, Y ), (a probability density function) the likelihood ratio may be represented as LR = Pr(X, Y | Hp) Pr(X, Y | Hd) =

  • f (x, y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ

=

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ ,

since X and Y are independent if Hd is true as they come from different sources and are conditionally independent given θ if Hp true.

Evidence evaluation for discrete data

slide-19
SLIDE 19

Evidence evaluation for discrete data

Evidence evaluation - continuous data

LR =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ .

Evidence evaluation for discrete data

slide-20
SLIDE 20

Evidence evaluation for discrete data

Evidence evaluation - continuous data

LR =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ .

f (x | θ) and f (y | θ) are the conditional distributions of the evidence x andy given a source parameterised by θ. A within-source distribution.

Evidence evaluation for discrete data

slide-21
SLIDE 21

Evidence evaluation for discrete data

Evidence evaluation - continuous data

LR =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ .

f (x | θ) and f (y | θ) are the conditional distributions of the evidence x andy given a source parameterised by θ. A within-source distribution. f (θ) is the distribution of the parameter θ between sources. A between-source distribution. It may be of a different form to f (x | θ) and f (y | θ).

Evidence evaluation for discrete data

slide-22
SLIDE 22

Evidence evaluation for discrete data

Evidence evaluation - continuous data

LR =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ .

f (x | θ) and f (y | θ) are the conditional distributions of the evidence x andy given a source parameterised by θ. A within-source distribution. f (θ) is the distribution of the parameter θ between sources. A between-source distribution. It may be of a different form to f (x | θ) and f (y | θ). The density functions are estimated from training data which are samples from some relevant population ; hence random effects.

Evidence evaluation for discrete data

slide-23
SLIDE 23

Evidence evaluation for discrete data

Evidence evaluation - continuous data

Evidence evaluation for discrete data

slide-24
SLIDE 24

Evidence evaluation for discrete data

Evidence evaluation - continuous data

Consider vector evidence x = (x1, . . . , xm) and y = (y1, . . . , yn) with components of x and y independent conditional on θ. Example: m fragments of glass from a broken window at a crime scene; n fragments of glass from the clothing of a suspect. LR =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ

= m

i=1 f (xi | θ) n j=1 f (yj | θ)f (θ)dθ

m

i=1 f (xi | θ)f (θ)dθ

n

j=1 f (yj | θ)f (θ)dθ . Evidence evaluation for discrete data

slide-25
SLIDE 25

Evidence evaluation for discrete data

Evidence evaluation - continuous data

Consider vector evidence x = (x1, . . . , xm) and y = (y1, . . . , yn) with components of x and y independent conditional on θ. Example: m fragments of glass from a broken window at a crime scene; n fragments of glass from the clothing of a suspect. LR =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ

= m

i=1 f (xi | θ) n j=1 f (yj | θ)f (θ)dθ

m

i=1 f (xi | θ)f (θ)dθ

n

j=1 f (yj | θ)f (θ)dθ .

Data need not be independent. For example, the quantities of cocaine on banknotes are autocorrelated at lag 1.

Evidence evaluation for discrete data

slide-26
SLIDE 26

Evidence evaluation for discrete data

Evidence evaluation - continuous data: examples for independent data

Evidence evaluation for discrete data

slide-27
SLIDE 27

Evidence evaluation for discrete data

Evidence evaluation - continuous data: examples for independent data Elemental composition of glass: Multivariate data of measurements

  • f elemental proportions in fragments of glass:

Within-source: fragments from a glass object (e.g., bottle); Between-source: different bottles.

Measurements: composition of several elements.

Evidence evaluation for discrete data

slide-28
SLIDE 28

Evidence evaluation for discrete data

Evidence evaluation - continuous data: examples for independent data Elemental composition of glass: Multivariate data of measurements

  • f elemental proportions in fragments of glass:

Within-source: fragments from a glass object (e.g., bottle); Between-source: different bottles.

Measurements: composition of several elements. Chemical composition of drugs: Multivariate data of measurements

  • f chemical composition of samples of drugs:

Within-source: samples from the same batch of drugs; Between-source: different batches of drugs.

Measurement of several chemicals.

Evidence evaluation for discrete data

slide-29
SLIDE 29

Evidence evaluation for discrete data

Evidence evaluation - discrete data

Evidence evaluation for discrete data

slide-30
SLIDE 30

Evidence evaluation for discrete data

Evidence evaluation - discrete data

For continuous data: LR =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ .

Evidence evaluation for discrete data

slide-31
SLIDE 31

Evidence evaluation for discrete data

Evidence evaluation - discrete data

For continuous data: LR =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ .

For discrete data: LR = Pr(E | Hp) Pr(E | Hd) = Pr(X, Y | Hp) Pr(X, Y | Hd). =

  • Pr(X = x | θ)Pr(Y = y | θ)f (θ)dθ
  • Pr(X = x | θ)f (θ)dθ
  • Pr(Y = y | θ)f (θ)dθ .

Evidence evaluation for discrete data

slide-32
SLIDE 32

Evidence evaluation for discrete data

Evidence evaluation - discrete data

For independent discrete vector data x = (x1, . . . , xm) and y = (y1, . . . , yn): LR =

  • Pr(X = x | θ)Pr(Y = y | θ)f (θ)dθ
  • Pr(X = x | θ)f (θ)dθ
  • Pr(Y = y | θ)f (θ)dθ

= m

i=1 Pr(Xi = xi | θ) n j=1 Pr(Yj = yj | θ)f (θ)dθ

m

i=1 Pr(Xi = xi | θ)f (θ)dθ

n

j=1 Pr(Yj = yj | θ)f (θ)dθ . Evidence evaluation for discrete data

slide-33
SLIDE 33

Evidence evaluation for discrete data

Evidence evaluation - discrete data

Evidence evaluation for discrete data

slide-34
SLIDE 34

Evidence evaluation for discrete data

Evidence evaluation - discrete data

LR = m

i=1 Pr(Xi = xi | θ) n j=1 Pr(Yj = yj | θ)f (θ)dθ

m

i=1 Pr(Xi = xi | θ)f (θ)dθ

n

j=1 Pr(Yj = yj | θ)f (θ)dθ . Evidence evaluation for discrete data

slide-35
SLIDE 35

Evidence evaluation for discrete data

Evidence evaluation - discrete data

LR = m

i=1 Pr(Xi = xi | θ) n j=1 Pr(Yj = yj | θ)f (θ)dθ

m

i=1 Pr(Xi = xi | θ)f (θ)dθ

n

j=1 Pr(Yj = yj | θ)f (θ)dθ .

Prior distribution f (θ) can be continuous: it is a function of θ.

Evidence evaluation for discrete data

slide-36
SLIDE 36

Evidence evaluation for discrete data

Evidence evaluation - discrete data

LR = m

i=1 Pr(Xi = xi | θ) n j=1 Pr(Yj = yj | θ)f (θ)dθ

m

i=1 Pr(Xi = xi | θ)f (θ)dθ

n

j=1 Pr(Yj = yj | θ)f (θ)dθ .

Prior distribution f (θ) can be continuous: it is a function of θ. For independent data representing counts of events over time or space the simplest model for X and Y is a Poisson distribution. It has one parameter, the mean. The Poisson distribution has a characteristic that the mean and variance are equal.

Evidence evaluation for discrete data

slide-37
SLIDE 37

Evidence evaluation for discrete data

Evidence evaluation - discrete data

LR = m

i=1 Pr(Xi = xi | θ) n j=1 Pr(Yj = yj | θ)f (θ)dθ

m

i=1 Pr(Xi = xi | θ)f (θ)dθ

n

j=1 Pr(Yj = yj | θ)f (θ)dθ .

Prior distribution f (θ) can be continuous: it is a function of θ. For independent data representing counts of events over time or space the simplest model for X and Y is a Poisson distribution. It has one parameter, the mean. The Poisson distribution has a characteristic that the mean and variance are equal. A common choice of prior for the mean is the Gamma distribution.

Evidence evaluation for discrete data

slide-38
SLIDE 38

Evidence evaluation for discrete data

Evidence evaluation - discrete data

LR = m

i=1 Pr(Xi = xi | θ) n j=1 Pr(Yj = yj | θ)f (θ)dθ

m

i=1 Pr(Xi = xi | θ)f (θ)dθ

n

j=1 Pr(Yj = yj | θ)f (θ)dθ .

Prior distribution f (θ) can be continuous: it is a function of θ. For independent data representing counts of events over time or space the simplest model for X and Y is a Poisson distribution. It has one parameter, the mean. The Poisson distribution has a characteristic that the mean and variance are equal. A common choice of prior for the mean is the Gamma distribution. Dependent discrete data are difficult to model.

Evidence evaluation for discrete data

slide-39
SLIDE 39

Evidence evaluation for discrete data

Forensic Phonetics

Evidence evaluation for discrete data

slide-40
SLIDE 40

Evidence evaluation for discrete data

Forensic Phonetics

X : number of clicks in a period of speech by control speaker - source known. There could be several pieces of speech from the same person and several time intervals: x11 x12 . . . x1m1 · · · · · · · · · · · · xm1 xm2 . . . xmm1

Evidence evaluation for discrete data

slide-41
SLIDE 41

Evidence evaluation for discrete data

Forensic Phonetics

X : number of clicks in a period of speech by control speaker - source known. There could be several pieces of speech from the same person and several time intervals: x11 x12 . . . x1m1 · · · · · · · · · · · · xm1 xm2 . . . xmm1 Y : number of clicks in one or more periods of speech by an unknown speaker (may be the same speaker as the control speaker but this is not known). Hp : the prosecution proposition - the control and unknown speakers are the same person. Hd : the defence proposition - the control and unknown speakers are different people.

Evidence evaluation for discrete data

slide-42
SLIDE 42

Evidence evaluation for discrete data

Clicks - data

Data have been recorded for 100 speakers in the form of recording the numbers of clicks in a piece of speech varying from four to six minutes in length. A sample of 10 such recordings is given here. Subject Min.1 Min.2 Min.3 Min.4 Min.5 Min.6 1 1 2 2 3 4 3 2 1 2 5 4 5 1 1 6 1 1 1 7 17 11 15 7 8 9 1 1 2 10 1 1 Evidence evaluation for discrete data

slide-43
SLIDE 43

Evidence evaluation for discrete data

Possible models The data as currently recorded have no within-speaker variation thus estimation of that is not possible. Instead, three simple models for evidence evaluation with discrete data are described with small-scale results obtained to illustrate their applications.

Poisson-gamma: this assumes the number of clicks per minute has a Poisson distribution and that the mean of the Poisson distribution has a gamma distribution. This is not realistic for the data set

  • n clicks as there is evidence of dependence between the numbers of clicks per minute within

speakers and that the variance of the number of clicks per minute is not equal to the mean number

  • f clicks per minute.

Bivariate Bernoulli observations: as an initial investigation of dependence in the number of clicks between adjacent minutes, a model is proposed in which the data are reduced to a binary variable of the absence or presence of a click in a particular minute. The probability of a click in the second minute is dependent on the absence or presence of a click in the first minute. Empirical model: this is a statistic designed to be large if the numbers of clicks for the control and recovered data are similar and if they are rare. There is no probabilistic model associated with this. Evidence evaluation for discrete data

slide-44
SLIDE 44

Evidence evaluation for discrete data

Poisson-gamma model X ∼ Poisson(λ1); Y ∼ Poisson(λ2); Hp : λ1 = λ2; X and Y independent given λ; Hd : λ1 may or may not equal λ2; X and Y independent. The prior for λ is Gamma(α, β). The parameters α and β of the prior distribution for λ can be estimated from a training set of values of λ by method of moments or maximum likelihood. A training set z is needed for

  • this. Alternatively, values may be chosen subjectively.

LR: the factor that multiplies the prior odds in favour of Hp relative to Hd to give the posterior odds in favour of Hp.

Evidence evaluation for discrete data

slide-45
SLIDE 45

Evidence evaluation for discrete data

Poisson-gamma - likelihood ratio

Pr(Y = y | λ) = e−λλy /y!; f (λ | α, β) = βαλα−1e−βλ/Γ(α);

  • Pr(Y = y | λ)f (λ | α, β)dλ

= βα y! Γ(α)

  • e−(1+β)λλα+y−1dλ

= βα y! Γ(α) × Γ(α + y) (β + 1)α+y .

  • Pr(X = x | λ)f (λ | α, β)dλ

= βα x! Γ(α)

  • e−(1+β)λλα+x−1dλ

= βα x! Γ(α) × Γ(α + x) (β + 1)α+x .

  • Pr(X = x | λ)Pr(Y = y | λ)f (λ | α, β)dλ

= βα Γ(α)x!y!

  • e−(2+β)λλα+x+y−1dλ

= βα Γ(α) x! y! × Γ(x + y + α) (β + 2)α+x+y . LR = Γ(α + x + y) Γ(α) Γ(α + x) Γ(α + y) × (β + 1)2α+x+y βα (β + 2)α+x+y . Evidence evaluation for discrete data

slide-46
SLIDE 46

Evidence evaluation for discrete data

Poisson-gamma - example

Values of evidence for lengths of observations kx = ky = 6 for various numbers of outcomes of control x and recovered y evidence and various values of parameters (α, β) of the gamma prior distribution. Value of the evidence V tx = ty = α = 3 α = 2 α = 4 α = 9 kx

i=1 xi

ky

i=1 yi

β = 1 β = 2 β = 1 β = 3 E(X) = 3 E(X) = 1 E(X) = 4 E(X) = 3 Var(X) = 3 Var(X) = 0.5 Var(X) = 4 Var(X) = 1 53.5 5.22 201.84 198.36 4 4 5.3 1.50 13.45 12.25 8 8 2.6 1.82 4.62 3.20 4 4.5 0.56 16.97 25.71 8 0.4 0.06 1.43 3.33 12 0.03 0.006 0.12 0.43 Evidence evaluation for discrete data

slide-47
SLIDE 47

Evidence evaluation for discrete data

Poisson-gamma - parameter interpretation

Expectations E(X) and variances Var(X) of a gamma distribution which may be considered to represent the associated verbal interpretations of the variations of items over members of a relevant population for investigation

  • f a forensic characteristic.

E(X) Var(X) Interpretation 3 3 The mean number of occurrences per unit time will be around 3 but there will be some variation in the item means. 1 0.5 The mean number of occurrences per unit time will be around 1 with very little variation about this. 4 4 The mean number of occurrences per unit time will be around 4 but with a lot of variation about this. 3 1 The mean number of occurrences per unit time will be around 3 with not much variation about this.

Evidence evaluation for discrete data

slide-48
SLIDE 48

Evidence evaluation for discrete data

Bivariate - Bernoulli model Consider absence / presence of a click within a fixed time period (e.g. a minute) as a binary variable: absent (0) or present (1); Consider two periods, each of two minutes in length, as a control pair and another two periods, also of two minutes in length, as a recovered pair; Control data X = {(x11, x12), (x21, x22)} where xij = 0(1) if there is a no (a) click in the j-th part of the i-th period. Similarly, denote the recovered data as Y = {(y11, y12), (y21, y22)}.

Evidence evaluation for discrete data

slide-49
SLIDE 49

Evidence evaluation for discrete data

Bivariate - Bernoulli model Let p(xi1 = 0) = p(yi1 = 0) = θ0, p(xi1 = 1) = p(yi1 = 1) = 1−θ0, i = 1, 2; p(xi2 = 0 | xi1 = 0) = p(yi2 = 0 | yi1 = 0) = θ00 i = 1, 2; p(xi2 = 0 | xi1 = 1) = p(yi2 = 0 | yi1 = 1) = θ10 i = 1, 2. Denote (θ0, θ00, θ10) by θ. Assume beta(α, β) distributions for θ0, θ00, θ10 with parameters (α0, β0), (α00, β00), (α10, β10), respectively, denoted in general by (α, β). Parameters (α, β) may be estimated by appropriate method of moments estimators from sample proportions and variances from some relevant population.

Evidence evaluation for discrete data

slide-50
SLIDE 50

Evidence evaluation for discrete data

Likelihood ratio Hp: control and recovered data come from the same source; Hd: control and recovered data come from different sources. LR = Pr(X, Y | Hp) Pr(X, Y | Hd) =

  • f (x | θ)f (y | θ)f (θ)dθ
  • f (x | θ)f (θ)dθ
  • f (y | θ)f (θ)dθ .

This is a product and quotient of gamma functions with arguments (α, β, X, Y ) and is symmetric in X and Y .

Evidence evaluation for discrete data

slide-51
SLIDE 51

Evidence evaluation for discrete data

Sample results Priors: α0 = β0 = α00 = β00 = α10 = β10 = 1; LR1. α0 = 2, β0 = 1, α00 = 2, β00 = 1, α10 = 1.5, β10 = 2.5; LR2. α0 = 3, β0 = 1, α00 = 3, β00 = 1, α10 = 1.5, β10 = 2.5; LR3. (x11, x12) (x21, x22) (y11, y12) (y21, y22) LR1 LR2 LR3 (0, 0) (0, 0) (0, 0) (0, 0) 3.24 1.78 1.42 (0, 0) (1, 1) (0, 0) (1, 1) 2.13 1.51 1.52 (0, 0) (0, 0) (1, 1) (1, 1) 0.30 0.40 0.48 (1, 0) (0, 1) (0, 0) (1, 1) 0.53 0.72 0.81 (0, 0) (0, 1) (0, 0) (0, 1) 2.16 1.60 0.94 (1, 0) (0, 1) (0, 0) (0, 0) 0.53 0.48 0.53

Evidence evaluation for discrete data

slide-52
SLIDE 52

Evidence evaluation for discrete data

Parameter interpretation LR1 : Uniform priors: no preference given to any particular set of values for the probability of a zero. LR2 : more weight to zero in first place, to zero in second place given zero in first place, and to one in first place given one in first place. LR3 : even more weight to zero in first place, and to zero in second place given zero in first place; same weight to one on first place given one in first place.

Evidence evaluation for discrete data

slide-53
SLIDE 53

Evidence evaluation for discrete data

Empirical model

Consider a piece of speech from a known person (e.g., suspect) (control speech). The number of minutes of speech are lx and the number of clicks per minute are x = {x1, . . . , xlx }. Consider a piece of speech from an unknown person (e.g., audio recording associated with a crime) (recovered speech). The number

  • f minutes of speech are ly and the number of clicks per minute are

y = {y1, . . . , yly }. Let p(x) = p(x1, . . . , xl) and p(y) = p(y1, . . . , yl) be the probabilities

  • f x and y, respectively under some statistical model. Part of the

problem is to determine the appropriate model.

Evidence evaluation for discrete data

slide-54
SLIDE 54

Evidence evaluation for discrete data

Empirical model - likelihood ratio

The following statistic is proposed for the likelihood ratio (LR): exp{− l

k=1(xk − yk)2}

p(x1, . . . , xl) × p(y1, . . . , yl). (1) Properties: The numerator measures similarity. The more similar the control and recovered speech are in terms of numbers of clicks in each minute, the larger the value of the numerator and hence the larger the LR. The denominator measures rarity. The more rare the control and recovered speech are in terms of numbers of clicks in each minute, the smaller the value of the denominator and hence the larger the LR.

Evidence evaluation for discrete data

slide-55
SLIDE 55

Evidence evaluation for discrete data

Empirical model - The numbers of clicks per minute for 100 speakers The following frequencies are obtained for each of the possible number of clicks from 0 to 17: Minute 1 2 3 4 5 6 7 1 61 21 6 4 5 1 2 57 25 8 5 2 1 1 3 63 19 11 4 1 1 4 52 28 10 2 5 1 2 5 6 1 1 1 6 2 5 1 1 Total 241 99 36 15 14 6 1 2

Evidence evaluation for discrete data

slide-56
SLIDE 56

Evidence evaluation for discrete data

Empirical model - The numbers of clicks per minute for 100 speakers In addition, there are occurrences of 9 clicks and 17 clicks in the first minutes of recording and of 11 clicks in the second minute and of 15 clicks in the third minute. The overall total number of clicks is thus 418 and the overall relative frequencies are 241/418, 99/418, . . .. There are several zero entries for the number of clicks per minute (for example, 8, 10, 12, 13, 14, 16 and everything greater than 17). To allow for this, add 1 to all frequencies for clicks per minute. The last row of the frequency table is then Total 242 100 37 16 15 7 2 3

Evidence evaluation for discrete data

slide-57
SLIDE 57

Evidence evaluation for discrete data

Empirical model - data Twos are also recorded for the occurrences of 9, 11, 15 and 17 clicks. The sum of all these frequencies is then 430. The overall relative frequencies are then 1 2 3 4 5 0.563 0.233 0.086 0.037 0.035 0.016 6 7 9 11 15 17 0.005 0.007 0.005 0.005 0.005 0.005 If a previously unobserved number of clicks per second is observed in a particular case, record the frequency as 1/431 and adjust the other frequencies appropriately.

Evidence evaluation for discrete data

slide-58
SLIDE 58

Evidence evaluation for discrete data

Empirical model - example

x lx y ly (xk − yk )2 px py LR (1) 0000 4 0000 4 0.5634 0.5634 99 0000 4 1000 4 1 0.5634 0.5633 × 0.233 88 0000 4 1111 4 4 0.5634 0.2334 62 0000 4 2222 4 16 0.5634 0.0864 1/49 0000 4 3333 4 36 0.5634 0.0374 1/(8.1 × 108) 00 2 11 2 2 0.5632 0.2332 7.9 00 2 22 2 8 0.5632 0.0.0862 1/7.0 00 2 33 2 18 0.5632 0.0372 × 0.233 1/(2.8 × 104) Evidence evaluation for discrete data

slide-59
SLIDE 59

Evidence evaluation for discrete data

Discussion

Evidence evaluation for discrete data

slide-60
SLIDE 60

Evidence evaluation for discrete data

Discussion Models for autocorrelation could require many parameters.

Evidence evaluation for discrete data

slide-61
SLIDE 61

Evidence evaluation for discrete data

Discussion Models for autocorrelation could require many parameters. For the Poisson model, the mean and variance are equal so within-group variance is estimable without replicates observations within a group.

Evidence evaluation for discrete data

slide-62
SLIDE 62

Evidence evaluation for discrete data

Discussion Models for autocorrelation could require many parameters. For the Poisson model, the mean and variance are equal so within-group variance is estimable without replicates observations within a group. Nonparametric distributions for situations where regular models are inappropriate.

Evidence evaluation for discrete data

slide-63
SLIDE 63

Evidence evaluation for discrete data

Discussion Models for autocorrelation could require many parameters. For the Poisson model, the mean and variance are equal so within-group variance is estimable without replicates observations within a group. Nonparametric distributions for situations where regular models are inappropriate. For speech, models which account for temperament are required.

Evidence evaluation for discrete data

slide-64
SLIDE 64

Evidence evaluation for discrete data

Discussion Models for autocorrelation could require many parameters. For the Poisson model, the mean and variance are equal so within-group variance is estimable without replicates observations within a group. Nonparametric distributions for situations where regular models are inappropriate. For speech, models which account for temperament are required. The relevant population is determined with reference to the criminal and not the suspect.

Evidence evaluation for discrete data

slide-65
SLIDE 65

Evidence evaluation for discrete data

Acknowledgements Bayesian Biometrics for Forensics Network (BBFOR2) http://cls.ru.nl/projects/bbfor2 British Academy / Leverhulme Foundation (eGAP2): ‘Modelling features for forensic speaker comparison’.

Evidence evaluation for discrete data