How best to distinguish selection on discrete loci from the - - PDF document

how best to distinguish selection on discrete loci from
SMART_READER_LITE
LIVE PREVIEW

How best to distinguish selection on discrete loci from the - - PDF document

How best to distinguish selection on discrete loci from the infinitesimal model? Nick Barton 2 Vienna Feb 2019.nb Longshanks Frank Chan, Layla Hiramitsu (Tbingen); Campbell Rolian (Calgary); Stefanie Belohlavy (IST); bioRxiv Two


slide-1
SLIDE 1

How best to distinguish selection on discrete loci from the infinitesimal model?

Nick Barton

slide-2
SLIDE 2

Longshanks

Frank Chan, Layla Hiramitsu (Tübingen); Campbell Rolian (Calgary); Stefanie Belohlavy (IST); bioRxiv Two replicates of ~30 mice: within-family selection for the longest tibia Use a composite trait logTM-0.57

2 Vienna Feb 2019.nb

slide-3
SLIDE 3

Some 10kb windows show strong allele frequency change: z=2 arcsin p ; Δz2 in 10kb windows

Vienna Feb 2019.nb 3

slide-4
SLIDE 4

Motivation

There is a rapid, consistent response to selection We know the selection, the pedigree, the sequence … Can we find the causal alleles ? A small experiment - but it represents larger populations, selected for a longer time.

4 Vienna Feb 2019.nb

slide-5
SLIDE 5

Outline

The infinitesimal as the null model Variation in SNP and haplotype frequencies Estimating effects of candidate loci on fitness and trait

Vienna Feb 2019.nb 5

slide-6
SLIDE 6

The infinitesimal with linkage

In this experiment, the pedigree is fixed, and so chromosomes evolve independently How much does infinitesimal selection affect allele frequencies?

  • =/
  • =/
  • =/
  • =/

Even strong selection has little effect The diffusion approximation works well Infinitesimal selection produces a slight excess of sweeps

6 Vienna Feb 2019.nb

slide-7
SLIDE 7

SNP are carried on haplotype blocks

Simulate, conditioning on the pedigree, and the observed heritability (assuming additivity)

SNP are thrown down onto the haplotype blocks

Vienna Feb 2019.nb 7

slide-8
SLIDE 8

Variance in SNP frequency is inflated

(grey/black: old/new data; colours: replicate simulations)

10 20 30 40 50 1000 2000 3000 4000

LS1 chrom 1

10 1000 2000 3000 10 20 30 40 50 1000 2000 3000 4000 5000

LS1 chrom 2

10 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 1000 2000 3000 4000

LS1 chrom 3

10 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 500 1000 1500 2000 2500

LS1 chrom 4

10 500 1000 1500 2000 10 20 30 40 50 1000 2000 3000 4000

LS1 chrom 5

10 500 1000 1500 2000 2500 3000 10 20 30 40 50 1000 2000 3000 4000 5000

LS1 chrom 6

10 1000 2000 3000 4000 5000 10 20 30 40 50 1000 2000 3000 4000

LS1 chrom 7

10 500 1000 1500 10 20 30 40 50 500 1000 1500 2000

LS1 chrom 8

10 500 1000 1500 2000 10 20 30 40 50 1400 1600 1800 2000

LS1 chrom 9

10 500 1000 1500 2000 2500 3000

LS1 chrom 10

1500 2000 2500

8 Vienna Feb 2019.nb

slide-9
SLIDE 9

10 20 30 40 50 500 1000 1500 10 500 1000 1500 10 20 30 40 50 1000 2000 3000 4000

LS1 chrom 11

10 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 500 1000 1500 2000 2500

LS1 chrom 12

10 500 1000 1500 2000 2500 10 20 30 40 50 1000 2000 3000 4000

LS1 chrom 13

10 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 1000 2000 3000 4000 5000

LS1 chrom 14

10 1000 2000 3000 4000 5000 10 20 30 40 50 500 1000 1500 2000 2500 3000

LS1 chrom 15

10 500 1000 1500 2000 2500 3000 10 20 30 40 50 200 400 600 800 1000 1200

LS1 chrom 16

10 200 400 600 800 1000 1200 10 20 30 40 50 1000 2000 3000 4000

LS1 chrom 17

10 500 1000 1500 2000 2500 3000 3500 10 20 30 40 50 500 1000 1500 2000 2500 3000

LS1 chrom 18

10 500 1000 1500 2000 2500 10 20 30 40 50 500 1000 1500 2000

LS1 chrom 19

10 500 1000 1500 2000 50 100 150

LS1 chrom 20

50 100 150

Vienna Feb 2019.nb 9

slide-10
SLIDE 10

10 20 30 40 50 10 20

10 Vienna Feb 2019.nb

slide-11
SLIDE 11

Variation in SNP frequency is inflated by LD in the base population

In any window, we have ki of the i' th haplotype: Variation in SNP frequencies reflect ki

< Δp2 > =

1 nS

i=1 ns Δpi 2

(1)

 < Δp2 > = j nT

2 var[k]

1 - ji - 1 n0 - 1 where j n0 is the initial SNP frequency

(2)

var < Δp2 > = 1 nS

2

i=1 ns varΔpi 2 + i,j=1 i≠j ns

covΔpi

2, Δpj 2

(3)

var < Δp2 > depends on the moments of ki and is inflated by LD in the base population. With large # of SNP, var < Δp2 > ~ covΔpi

2, Δpj 2, which increases with D2.

e.g. n0 = 32, ki = {10, 4, 2, 1, 1, 1, 1, 0, 0, …}, p0 = 0.5:

  • /

Δ ] Vienna Feb 2019.nb 11

slide-12
SLIDE 12

Is the candidate on chrom. 5 significant?

12 Vienna Feb 2019.nb

slide-13
SLIDE 13

Is the candidate on chrom. 5 significant?

Vienna Feb 2019.nb 13

slide-14
SLIDE 14

Is the candidate on chrom. 5 significant?

Pairs of simulations, starting from the same founder genomes, give outlier Δz2 that overlap the signal from LS1 (red) but not LS2 (orange) Based on SNP frequencies, the signal is marginally significant

14 Vienna Feb 2019.nb

slide-15
SLIDE 15

Three sources of variation in SNP frequencies

  • effects of founders
  • evolution of replicates
  • random SNP on haplotypes

Variation due to LD amongst SNP can be strong:

  • coalescent simulations of a well-mixed population
  • Kelly & Hughes: D. simulans

This source of error can be eliminated by working with haplotypes

  • haplotypes can be reconstructed from SNP frequencies (Kessner et al., 2013, Franssen et al., 2016)

Vienna Feb 2019.nb 15

slide-16
SLIDE 16

How strong is selection ?

Alleles in the candidate region on chrom. 5 sweep from p= 0.178 → 0.833, 0.981 in LS1, LS2 ⇒ s ~

1 t log p17 q17 q0 p0 ~ 0.25 (cf. Taus et al., 2017)

How large an effect on the trait? Simulate an additive allele, effect A; 40 replicates; s = 0.41 A Ve (le) The mean and sd from infinitesimal simulations (dots) fit with a single-locus WF model, Ne∼44(red)

  • 0.8
  • 0.6
  • 0.4
  • 0.2

A/ Ve

  • 0.30
  • 0.25
  • 0.20
  • 0.15
  • 0.10
  • 0.05

s

  • 0.8
  • 0.6
  • 0.4
  • 0.2

A 0.2 0.4 0.6 0.8 p17

  • 0.8
  • 0.6
  • 0.4
  • 0.2

0.05 0.10 0.15 0.20

The locus on chromosome 5 has effect A

 = 0.59

Ve (0.32 Ve to -0.87 Ve ). This single locus is responsi- ble for ~ 9.4% (3.6% - 15.5%) of the response .

16 Vienna Feb 2019.nb

slide-17
SLIDE 17

Summary

  • The infinitesimal should be used as the null model
  • In Longshanks, even strong infinitesimal selection has little effect

(but: selection was within families; the map is long)

  • Substantial variation is generated by random assignment of SNP to haplotypes
  • especially with LD in the base population
  • Even an obvious signal is marginally significant in any one line
  • How many loci contribute to the selection response ?

Vienna Feb 2019.nb 17