Monophyletic concordance between species trees and gene genealogies - - PowerPoint PPT Presentation
Monophyletic concordance between species trees and gene genealogies - - PowerPoint PPT Presentation
Monophyletic concordance between species trees and gene genealogies with multiple mergers Bjarki Eldon and James Degnan Phylomania 2010 University of Tasmania November 4-5, 2010 Low offspring number models Kingman (1982) introduced the n
Low offspring number models
Kingman (1982) introduced the n-coalescent from an exchangeable Cannings offspring model; let νi denote the number of offspring of individual i E[νk
1 ] < ∞
as N → ∞; k ≥ 1 M¨
- hle and Sagitov (2001) characterised coalescent processes based
- n the timescale cN
cN = E[ν1(ν1 − 1)] N − 1
Conditions for convergence to Kingman’s coalescent
Wright-Fisher and Moran models are exchangeable Cannings models with lim
N→∞
E[ν1(ν1 − 1)(ν1 − 2)] N2cN = 0 implying cN → 0 and convergence to Kingman’s coalescent.
High variance in offspring distribution
Ecology, reproductive biology, and genetics of a diverse group of marine organisms suggest many offspring contributed by few individuals (Beckenbach 94; Hedgecock 94) Direct genotyping of parents and offspring provides evidence of large families in Pacific oyster (Boudry etal 2002) and Lion-Paw Scallop (Petersen etal 2008) Cod, oysters, mussels, barnacles, sea stars, plants ?
Evidence for large offspring distribution
◮ broadcast spawning and external fertilization ◮ high initial mortality ◮ very large population sizes ◮ low genetic diversity ◮ large number of singleton genetic variants
Λ coalescent allows multiple mergers
Donnelly and Kurtz (1999), Pitman (1999), and Sagitov (1999) independently introduce a multiple merger coalescent; Λ-coalescent with coalescence rate λb,k = b k 1 xk(1 − x)b−kx−2Λ(dx) Kingman’s coalescent is obtained if Λ = δ0 For simultaneous multiple merger coalescent processes, see Schweinsberg (2000) and M¨
- hle and Sagitov (2001).
Schweinsberg’s heavy-tail model
Schweinsberg (2003) Each individual produces a random number Xi of potential
- ffspring; C > 0 and a > 0 and constant population size N
P[Xi ≥ k] ∼ C/ka and E[Xi] > 1 From the pool of potential offspring, sample without replacement to form the new generation
Coalescent process depends on a
Coalescent timescale in units of cN ∼ Na−1 if 1 < a < 2 case coalescent coalescence rate a ≥ 2 Kingman coalescent b 2
- 1 ≤ a < 2
Λ ∼ Beta(2 − a, a) b k B(k − a, b − k + a) B(2 − a, a) 0 < a < 1 Ξ-coalescent
A modified Moran model
Eldon and Wakeley (2006) A modified Moran model, in which the offspring number U is random rather than fixed at one as in the usual Moran model P[U = u] = (1 − εN)δ2 + εNδ[ψN] and εN ∼ 1/Nγ, γ > 0
Coalescent process depends on γ
Coalescent timescale is Nγ = min
- Nγ, N2
, γ > 0 case coalescence rate timescale γ > 2 n 2
- N2
γ = 2 b k δ2 + ψk(1 − ψ)b−k N2 γ < 2 b k
- ψk(1 − ψ)b−k
Nγ, 1 < γ < 2
Ratios of coalescence times for Λ = K + Λψ
100 200 300 400 500 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
- : R1;
△ : R2; ▽ : R3; ⋄ : R4; + : Rn−1 sample size n
Ratios of coalescence times for Λ = Beta(0.9, 1.1)
100 200 300 400 500 0.0 0.2 0.4 0.6 0.8
- : R1;
△ : R2; ▽ : R3; ⋄ : R4; + : Rn−1 sample size n
Ratios of coalescence times for Λ = Beta(0.1, 1.9)
100 200 300 400 500 0.0 0.2 0.4 0.6 0.8
- : R1;
△ : R2; ▽ : R3; ⋄ : R4; + : Rn−1 sample size n
Monophyletic concordance for Λ coalescents t A B
Not monophyletic concordance t A B
General form for P[MC] for two species
P[MC] =
- mA,mB
P[MC; mA, mB]P[mA, mB] with P[nA, nB] = GnA,mA(t)GnB,mB(t) and P[MC; mA, mB] =
mA+mB
- k=2
βmA+mB,k
- P[MC; mA − k + 1, mB]
mA k
- +P[MC; mA, mB − k + 1]
mB k
- /
mA + mB k
Computing Gi,j(t)
Gi,j(t) is the probability of j lines at time t when starting from i lines at time zero within one population A vector c of ordered mergers associated with Kingman’s coalescent is simply {2, 2, . . . , 2} By way of example, starting from 10 lines, say, a coalescence sequence could be {3, 2, 5, 3} in a Λ coalescent. Conditioning on the embedded chain, or the order of mergers Transition probabilities βi,j = qi,j
- k=i qi,k
if i = j
- therwise
The rate matrix QA of (At; t ≥ 0) is qj,i =
- j
j − i + 1 1 xj−i−1(1 − x)i−1Λ(dx) qj,j = −
j−1
- i=1
qj,i, 2 ≤ j ≤ n qj,i = 0,
- therwise
Using eigenvectors and eigenvalues of QA
Eigenvalues of QA are α(k) = qk,k Left eigenvector l(k) =
- l(k)
1 , . . . , l(k) n
- Right eigenvector r(k) =
- r(k)
1
, . . . , r(k)
n
- Obtained by recursions
l(k)
j
= qj+1,jl(k)
j+1 + · · · + qk,jl(k) k
qk,k − qj,j , 1 ≤ j < k r(k)
j
= qj,kr(k)
k
+ · · · + qj,j−1r(k)
j−1
qk,k − qj,j , 1 < k < j ≤ n
The spectral decomposition of QA yields the transition probabilites Gi,j(t) ≡ P[At = j|A0 = i] as Gi,j(t) =
i
- k=j
e−α(k)tr(k)
i
l(k)
j
Transition probabilities Gi,j for i = 3 G3,2(t) = q3,2 q3,2 + q3,3 P[T3 ≤ t, T3 + T2 > t] G3,1(t) = q3,2 q3,2 + q3,3 P[T3 + T2 ≤ t] + q3,3 q3,2 + q3,3 P[T3 ≤ t] G3,3(t) = P[T3 > t] and G3,1(t) + G3,2(t) + G3,3(t) = 1
An example with Λψ
Process with infinitesimal parameters qij = i j
- ψi−j+1(1 − ψ)j−1
For i = 3 we obtain, with α(k) ≡ 1
k=i−1 qik
G3,2(t) = 3 2
- e−α(2)t − e−α(3)t
G3,1(t) = 1 − 3 2e−α(2)t + 1 2e−α(3)t G3,3(t) = e−α(3)t
In general, Gi,j(t) =
- c∈Ci,j
gc(t), 1 ≤ j < i in which c is a coalescence sequence; or a particular order of mergers in going from i to j sequences. Number of possible sequences is |Ci,j| = 2i−j−1
gc(t) = p(c)P[T(c) ≤ t, T(c) + Tj > t] if j > 1 p(c)P[T(c) ≤ t] if j = 1 P[Ti > t] if j = i in which P[T(c) ≤ t, T(c) + Tj > t] = e−α(j)t
l
- k=1
γk β(ik, j)
- 1 − e−β(ik,j)t
with β(ik, j) ≡ α(ik) − α(j); and P [T(c) ≤ t] =
l
- k=1
γ′
k
- 1 − e−α(ik)t
Example: two species
The probability P[MC] of monophyletic concordance for two lines from each of two species, with αX(k) =
1≤k≤i−1 qik (for species
X) P[MC] = (1 − e−αA(2)t)(1 − e−αB(2)t) + e−αA(2)t(1 − e−αB(2)t)β3,2/3 + (1 − e−αA(2)t)e−αB(2)tβ3,2/3 + e−αA(2)te−αB(2)tβ4,2β3,2/9
Two species and two lines each
0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 psi 1.2 1.4 1.6 1.8 0.0 0.2 0.4 0.6 0.8 1.0 a
- : Λψ;
△ : K + Λψ
- : Beta(2 − a, a)
Two species and two lines each
2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0
- : Λ0.05;
△ : K + Λ0.05; ⋄ : Beta(0.95, 1.05); + : K time t
Two species and two lines each
2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0
- : Λ0.99;
△ : K + Λ0.99; ⋄ : Beta(0.05, 1.95); + : K time t
Two species and three lines each
0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 0.0 0.2 0.4 0.6 0.8 1.0
- : Λψ;
△ : K + Λψ
- : Beta(2 − a, a)
Two species and three lines each
2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0
- : Λ0.05;
△ : K + Λ0.05; ⋄ : Beta(0.95, 1.05) time t
Two species and three lines each
2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0
- : Λ0.95;
△ : K + Λ0.95; ⋄ : Beta(0.05, 1.95) time t
Recursive approach for s species
Let ˜ n = n1 + · · · + ns in which ni denotes the number of ancestral lines for species i in a population; and let n = (n1, . . . , ns) P[MC; n] =
˜ n
- k=2
β˜
n,k s
- r=1
P[MC; m] nr k
- /
˜ n k
- in which m = (n1, n2, . . . , nr−1, nr − k + 1, nr+1, . . . , ns) and
P[MC; (0, 0, . . . , 0, 1)] = P[MC; (0, 0, . . . , 0, 1, 1)] = 1
Three species and two lines each (t1 = 1, t2 = 2)
0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8
- : Λ0.05;
△ : K + Λ0.05; ⋄ : Beta(0.95, 1.05) ψ = a − 1
Three species and two lines each (+ : K)
0.0 0.5 1.0 1.5 2.0 0.0 0.1 0.2 0.3 0.4
psi = 0.05; a = 1.95; t_2 = 0.05 + t_1
t_1 PMCLp ++++++++++++++++++++++++++++++++++++++++ 0.0 0.5 1.0 1.5 2.0 0.0 0.1 0.2 0.3 0.4
psi = 0.95; a = 1.001; t_2 = 0.05 + t_1
t_1 p1 ++++++++++++++++++++++++++++++++++++++++ 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8
psi = 0.95; a = 1.95; t_2 = 1 + t_1
t_1 p2 ++++++++++++++++++++++++++++++++++++++++ 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0
psi = 0.95; a = 1.001; t_2 = 1 + t_1
t_1 p3 ++++++++++++++++++++++++++++++++++++++++
- : Λ0.05;