Monophyletic concordance between species trees and gene genealogies - - PowerPoint PPT Presentation

monophyletic concordance between species trees and gene
SMART_READER_LITE
LIVE PREVIEW

Monophyletic concordance between species trees and gene genealogies - - PowerPoint PPT Presentation

Monophyletic concordance between species trees and gene genealogies with multiple mergers Bjarki Eldon and James Degnan Phylomania 2010 University of Tasmania November 4-5, 2010 Low offspring number models Kingman (1982) introduced the n


slide-1
SLIDE 1

Monophyletic concordance between species trees and gene genealogies with multiple mergers

Bjarki Eldon and James Degnan Phylomania 2010 University of Tasmania November 4-5, 2010

slide-2
SLIDE 2

Low offspring number models

Kingman (1982) introduced the n-coalescent from an exchangeable Cannings offspring model; let νi denote the number of offspring of individual i E[νk

1 ] < ∞

as N → ∞; k ≥ 1 M¨

  • hle and Sagitov (2001) characterised coalescent processes based
  • n the timescale cN

cN = E[ν1(ν1 − 1)] N − 1

slide-3
SLIDE 3

Conditions for convergence to Kingman’s coalescent

Wright-Fisher and Moran models are exchangeable Cannings models with lim

N→∞

E[ν1(ν1 − 1)(ν1 − 2)] N2cN = 0 implying cN → 0 and convergence to Kingman’s coalescent.

slide-4
SLIDE 4

High variance in offspring distribution

Ecology, reproductive biology, and genetics of a diverse group of marine organisms suggest many offspring contributed by few individuals (Beckenbach 94; Hedgecock 94) Direct genotyping of parents and offspring provides evidence of large families in Pacific oyster (Boudry etal 2002) and Lion-Paw Scallop (Petersen etal 2008) Cod, oysters, mussels, barnacles, sea stars, plants ?

slide-5
SLIDE 5

Evidence for large offspring distribution

◮ broadcast spawning and external fertilization ◮ high initial mortality ◮ very large population sizes ◮ low genetic diversity ◮ large number of singleton genetic variants

slide-6
SLIDE 6

Λ coalescent allows multiple mergers

Donnelly and Kurtz (1999), Pitman (1999), and Sagitov (1999) independently introduce a multiple merger coalescent; Λ-coalescent with coalescence rate λb,k = b k 1 xk(1 − x)b−kx−2Λ(dx) Kingman’s coalescent is obtained if Λ = δ0 For simultaneous multiple merger coalescent processes, see Schweinsberg (2000) and M¨

  • hle and Sagitov (2001).
slide-7
SLIDE 7

Schweinsberg’s heavy-tail model

Schweinsberg (2003) Each individual produces a random number Xi of potential

  • ffspring; C > 0 and a > 0 and constant population size N

P[Xi ≥ k] ∼ C/ka and E[Xi] > 1 From the pool of potential offspring, sample without replacement to form the new generation

slide-8
SLIDE 8

Coalescent process depends on a

Coalescent timescale in units of cN ∼ Na−1 if 1 < a < 2 case coalescent coalescence rate a ≥ 2 Kingman coalescent b 2

  • 1 ≤ a < 2

Λ ∼ Beta(2 − a, a) b k B(k − a, b − k + a) B(2 − a, a) 0 < a < 1 Ξ-coalescent

slide-9
SLIDE 9

A modified Moran model

Eldon and Wakeley (2006) A modified Moran model, in which the offspring number U is random rather than fixed at one as in the usual Moran model P[U = u] = (1 − εN)δ2 + εNδ[ψN] and εN ∼ 1/Nγ, γ > 0

slide-10
SLIDE 10

Coalescent process depends on γ

Coalescent timescale is Nγ = min

  • Nγ, N2

, γ > 0 case coalescence rate timescale γ > 2 n 2

  • N2

γ = 2 b k δ2 + ψk(1 − ψ)b−k N2 γ < 2 b k

  • ψk(1 − ψ)b−k

Nγ, 1 < γ < 2

slide-11
SLIDE 11

Ratios of coalescence times for Λ = K + Λψ

100 200 300 400 500 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

  • : R1;

△ : R2; ▽ : R3; ⋄ : R4; + : Rn−1 sample size n

slide-12
SLIDE 12

Ratios of coalescence times for Λ = Beta(0.9, 1.1)

100 200 300 400 500 0.0 0.2 0.4 0.6 0.8

  • : R1;

△ : R2; ▽ : R3; ⋄ : R4; + : Rn−1 sample size n

slide-13
SLIDE 13

Ratios of coalescence times for Λ = Beta(0.1, 1.9)

100 200 300 400 500 0.0 0.2 0.4 0.6 0.8

  • : R1;

△ : R2; ▽ : R3; ⋄ : R4; + : Rn−1 sample size n

slide-14
SLIDE 14

Monophyletic concordance for Λ coalescents t A B

slide-15
SLIDE 15

Not monophyletic concordance t A B

slide-16
SLIDE 16

General form for P[MC] for two species

P[MC] =

  • mA,mB

P[MC; mA, mB]P[mA, mB] with P[nA, nB] = GnA,mA(t)GnB,mB(t) and P[MC; mA, mB] =

mA+mB

  • k=2

βmA+mB,k

  • P[MC; mA − k + 1, mB]

mA k

  • +P[MC; mA, mB − k + 1]

mB k

  • /

mA + mB k

slide-17
SLIDE 17

Computing Gi,j(t)

Gi,j(t) is the probability of j lines at time t when starting from i lines at time zero within one population A vector c of ordered mergers associated with Kingman’s coalescent is simply {2, 2, . . . , 2} By way of example, starting from 10 lines, say, a coalescence sequence could be {3, 2, 5, 3} in a Λ coalescent. Conditioning on the embedded chain, or the order of mergers Transition probabilities βi,j =        qi,j

  • k=i qi,k

if i = j

  • therwise
slide-18
SLIDE 18

The rate matrix QA of (At; t ≥ 0) is qj,i =

  • j

j − i + 1 1 xj−i−1(1 − x)i−1Λ(dx) qj,j = −

j−1

  • i=1

qj,i, 2 ≤ j ≤ n qj,i = 0,

  • therwise
slide-19
SLIDE 19

Using eigenvectors and eigenvalues of QA

Eigenvalues of QA are α(k) = qk,k Left eigenvector l(k) =

  • l(k)

1 , . . . , l(k) n

  • Right eigenvector r(k) =
  • r(k)

1

, . . . , r(k)

n

  • Obtained by recursions

l(k)

j

= qj+1,jl(k)

j+1 + · · · + qk,jl(k) k

qk,k − qj,j , 1 ≤ j < k r(k)

j

= qj,kr(k)

k

+ · · · + qj,j−1r(k)

j−1

qk,k − qj,j , 1 < k < j ≤ n

slide-20
SLIDE 20

The spectral decomposition of QA yields the transition probabilites Gi,j(t) ≡ P[At = j|A0 = i] as Gi,j(t) =

i

  • k=j

e−α(k)tr(k)

i

l(k)

j

slide-21
SLIDE 21

Transition probabilities Gi,j for i = 3 G3,2(t) = q3,2 q3,2 + q3,3 P[T3 ≤ t, T3 + T2 > t] G3,1(t) = q3,2 q3,2 + q3,3 P[T3 + T2 ≤ t] + q3,3 q3,2 + q3,3 P[T3 ≤ t] G3,3(t) = P[T3 > t] and G3,1(t) + G3,2(t) + G3,3(t) = 1

slide-22
SLIDE 22

An example with Λψ

Process with infinitesimal parameters qij = i j

  • ψi−j+1(1 − ψ)j−1

For i = 3 we obtain, with α(k) ≡ 1

k=i−1 qik

G3,2(t) = 3 2

  • e−α(2)t − e−α(3)t

G3,1(t) = 1 − 3 2e−α(2)t + 1 2e−α(3)t G3,3(t) = e−α(3)t

slide-23
SLIDE 23

In general, Gi,j(t) =

  • c∈Ci,j

gc(t), 1 ≤ j < i in which c is a coalescence sequence; or a particular order of mergers in going from i to j sequences. Number of possible sequences is |Ci,j| = 2i−j−1

slide-24
SLIDE 24

gc(t) =                p(c)P[T(c) ≤ t, T(c) + Tj > t] if j > 1 p(c)P[T(c) ≤ t] if j = 1 P[Ti > t] if j = i in which P[T(c) ≤ t, T(c) + Tj > t] = e−α(j)t

l

  • k=1

γk β(ik, j)

  • 1 − e−β(ik,j)t

with β(ik, j) ≡ α(ik) − α(j); and P [T(c) ≤ t] =

l

  • k=1

γ′

k

  • 1 − e−α(ik)t
slide-25
SLIDE 25

Example: two species

The probability P[MC] of monophyletic concordance for two lines from each of two species, with αX(k) =

1≤k≤i−1 qik (for species

X) P[MC] = (1 − e−αA(2)t)(1 − e−αB(2)t) + e−αA(2)t(1 − e−αB(2)t)β3,2/3 + (1 − e−αA(2)t)e−αB(2)tβ3,2/3 + e−αA(2)te−αB(2)tβ4,2β3,2/9

slide-26
SLIDE 26

Two species and two lines each

0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 psi 1.2 1.4 1.6 1.8 0.0 0.2 0.4 0.6 0.8 1.0 a

  • : Λψ;

△ : K + Λψ

  • : Beta(2 − a, a)
slide-27
SLIDE 27

Two species and two lines each

2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0

  • : Λ0.05;

△ : K + Λ0.05; ⋄ : Beta(0.95, 1.05); + : K time t

slide-28
SLIDE 28

Two species and two lines each

2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0

  • : Λ0.99;

△ : K + Λ0.99; ⋄ : Beta(0.05, 1.95); + : K time t

slide-29
SLIDE 29

Two species and three lines each

0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.0 0.2 0.4 0.6 0.8 1.0

  • : Λψ;

△ : K + Λψ

  • : Beta(2 − a, a)
slide-30
SLIDE 30

Two species and three lines each

2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0

  • : Λ0.05;

△ : K + Λ0.05; ⋄ : Beta(0.95, 1.05) time t

slide-31
SLIDE 31

Two species and three lines each

2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0

  • : Λ0.95;

△ : K + Λ0.95; ⋄ : Beta(0.05, 1.95) time t

slide-32
SLIDE 32

Recursive approach for s species

Let ˜ n = n1 + · · · + ns in which ni denotes the number of ancestral lines for species i in a population; and let n = (n1, . . . , ns) P[MC; n] =

˜ n

  • k=2

β˜

n,k s

  • r=1

P[MC; m] nr k

  • /

˜ n k

  • in which m = (n1, n2, . . . , nr−1, nr − k + 1, nr+1, . . . , ns) and

P[MC; (0, 0, . . . , 0, 1)] = P[MC; (0, 0, . . . , 0, 1, 1)] = 1

slide-33
SLIDE 33

Three species and two lines each (t1 = 1, t2 = 2)

0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8

  • : Λ0.05;

△ : K + Λ0.05; ⋄ : Beta(0.95, 1.05) ψ = a − 1

slide-34
SLIDE 34

Three species and two lines each (+ : K)

0.0 0.5 1.0 1.5 2.0 0.0 0.1 0.2 0.3 0.4

psi = 0.05; a = 1.95; t_2 = 0.05 + t_1

t_1 PMCLp ++++++++++++++++++++++++++++++++++++++++ 0.0 0.5 1.0 1.5 2.0 0.0 0.1 0.2 0.3 0.4

psi = 0.95; a = 1.001; t_2 = 0.05 + t_1

t_1 p1 ++++++++++++++++++++++++++++++++++++++++ 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8

psi = 0.95; a = 1.95; t_2 = 1 + t_1

t_1 p2 ++++++++++++++++++++++++++++++++++++++++ 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0

psi = 0.95; a = 1.001; t_2 = 1 + t_1

t_1 p3 ++++++++++++++++++++++++++++++++++++++++

  • : Λ0.05;

△ : K + Λ0.05; ⋄ : Beta(0.95, 1.05)

slide-35
SLIDE 35

Conclusions

◮ Probability of monophyletic concordance depends on

parameters of multiple merger coalescent processes

◮ Presence of multiple mergers complicates computations ◮ Scaling time appropriately is important

slide-36
SLIDE 36

Acknowledgments

EPSRC and Marsden Fund for funding