CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic - - PowerPoint PPT Presentation

csci 8980 advanced topics in graphical models analysis of
SMART_READER_LITE
LIVE PREVIEW

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic - - PowerPoint PPT Presentation

Basics HMDP Inference Results HDPM Results CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation Instructor: Arindam Banerjee November 26, 2007 Basics HMDP Inference Results HDPM Results Genetic Polymorphism


slide-1
SLIDE 1

Basics HMDP Inference Results HDPM Results

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Instructor: Arindam Banerjee November 26, 2007

slide-2
SLIDE 2

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

slide-3
SLIDE 3

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

Two possible kinds of nucleotides at a single locus

slide-4
SLIDE 4

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G}

slide-5
SLIDE 5

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs

slide-6
SLIDE 6

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele

slide-7
SLIDE 7

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele

Haplotype

slide-8
SLIDE 8

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele

Haplotype

List of alleles in a local region of a chromosome

slide-9
SLIDE 9

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele

Haplotype

List of alleles in a local region of a chromosome Inherited as a unit, if there is no recombination

slide-10
SLIDE 10

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism

Single nucleotide polymorphism (SNP)

Two possible kinds of nucleotides at a single locus Nucleotide can be one of {A, C, T, G} Most genetic human variation are related to SNPs Each variant is called an allele

Haplotype

List of alleles in a local region of a chromosome Inherited as a unit, if there is no recombination

Repeated recombinations between ancestral haplotypes

slide-11
SLIDE 11

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism (Contd.)

Linkage disequilibrium (LD)

slide-12
SLIDE 12

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism (Contd.)

Linkage disequilibrium (LD)

Non-random association of alleles at different loci

slide-13
SLIDE 13

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism (Contd.)

Linkage disequilibrium (LD)

Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD

slide-14
SLIDE 14

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism (Contd.)

Linkage disequilibrium (LD)

Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD

Infer chromosomal recombination hotspots

slide-15
SLIDE 15

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism (Contd.)

Linkage disequilibrium (LD)

Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD

Infer chromosomal recombination hotspots

Help understand origin and characteristics of genetic variation

slide-16
SLIDE 16

Basics HMDP Inference Results HDPM Results

Genetic Polymorphism (Contd.)

Linkage disequilibrium (LD)

Non-random association of alleles at different loci Recombination decouples alleles, increase randomness, decrease LD

Infer chromosomal recombination hotspots

Help understand origin and characteristics of genetic variation

Analyze genetic variation to reconstruct evolutionary history

slide-17
SLIDE 17

Basics HMDP Inference Results HDPM Results

Haplotype Recombination and Inheritance

slide-18
SLIDE 18

Basics HMDP Inference Results HDPM Results

Hidden Markov Process

Generative model for choosing recombination sites

slide-19
SLIDE 19

Basics HMDP Inference Results HDPM Results

Hidden Markov Process

Generative model for choosing recombination sites Hidden Markov process

slide-20
SLIDE 20

Basics HMDP Inference Results HDPM Results

Hidden Markov Process

Generative model for choosing recombination sites Hidden Markov process

Hidden states correspond to index over chromosomes

slide-21
SLIDE 21

Basics HMDP Inference Results HDPM Results

Hidden Markov Process

Generative model for choosing recombination sites Hidden Markov process

Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates

slide-22
SLIDE 22

Basics HMDP Inference Results HDPM Results

Hidden Markov Process

Generative model for choosing recombination sites Hidden Markov process

Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates Emission model corresponds to mutation process that give descendants

slide-23
SLIDE 23

Basics HMDP Inference Results HDPM Results

Hidden Markov Process

Generative model for choosing recombination sites Hidden Markov process

Hidden states correspond to index over chromosomes Transition probabilities correspond to recombination rates Emission model corresponds to mutation process that give descendants

Implemented using a Hidden Markov Dirichlet Process (HMDP)

slide-24
SLIDE 24

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures

We know the basics of DPMs

slide-25
SLIDE 25

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures

We know the basics of DPMs Haplotype modeling using an infinite mixture model

slide-26
SLIDE 26

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures

We know the basics of DPMs Haplotype modeling using an infinite mixture model

A pool of ancestor haplotypes or founders

slide-27
SLIDE 27

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures

We know the basics of DPMs Haplotype modeling using an infinite mixture model

A pool of ancestor haplotypes or founders The size of the pool is unknown

slide-28
SLIDE 28

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures

We know the basics of DPMs Haplotype modeling using an infinite mixture model

A pool of ancestor haplotypes or founders The size of the pool is unknown

Standard coalescence based models

slide-29
SLIDE 29

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures

We know the basics of DPMs Haplotype modeling using an infinite mixture model

A pool of ancestor haplotypes or founders The size of the pool is unknown

Standard coalescence based models

Hidden variables is prohibitively large

slide-30
SLIDE 30

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures

We know the basics of DPMs Haplotype modeling using an infinite mixture model

A pool of ancestor haplotypes or founders The size of the pool is unknown

Standard coalescence based models

Hidden variables is prohibitively large Hard to perform inference of ancestral features

slide-31
SLIDE 31

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Hi = [Hi,1, . . . , Hi,T] haplotype over T SNPs, chromosome i

slide-32
SLIDE 32

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Hi = [Hi,1, . . . , Hi,T] haplotype over T SNPs, chromosome i Ak = [Ak,1, . . . , Ak,T] ancestral haplotype, mutation rate θk

slide-33
SLIDE 33

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Hi = [Hi,1, . . . , Hi,T] haplotype over T SNPs, chromosome i Ak = [Ak,1, . . . , Ak,T] ancestral haplotype, mutation rate θk Ci, inheritance variable, latent ancestor of Hi

slide-34
SLIDE 34

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Hi = [Hi,1, . . . , Hi,T] haplotype over T SNPs, chromosome i Ak = [Ak,1, . . . , Ak,T] ancestral haplotype, mutation rate θk Ci, inheritance variable, latent ancestor of Hi Generative Model:

slide-35
SLIDE 35

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Hi = [Hi,1, . . . , Hi,T] haplotype over T SNPs, chromosome i Ak = [Ak,1, . . . , Ak,T] ancestral haplotype, mutation rate θk Ci, inheritance variable, latent ancestor of Hi Generative Model:

Draw a first haplotype a1|DP(τ, Q0) ∼ Q0 h1 ∼ Ph(·|a1, θ1)

slide-36
SLIDE 36

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Hi = [Hi,1, . . . , Hi,T] haplotype over T SNPs, chromosome i Ak = [Ak,1, . . . , Ak,T] ancestral haplotype, mutation rate θk Ci, inheritance variable, latent ancestor of Hi Generative Model:

Draw a first haplotype a1|DP(τ, Q0) ∼ Q0 h1 ∼ Ph(·|a1, θ1) For subsequent haplotypes ci|DP(τ, Q0) ∼

  • p(ci = cjfor some j < i|c1, . . . , ci−1) =

ncj i−1+α0

p(ci = cjfor all j < i|c1, . . . , ci−1) =

α0 i−1+α0

slide-37
SLIDE 37

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Generative Model (contd)

slide-38
SLIDE 38

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Generative Model (contd)

Sample the founder of haplotype i φci|DP(τ, Q0)

  • = {acj, θcj}ifci = cjfor somej < i

∼ Q(a, θ)ifci = cjfor allj < i

slide-39
SLIDE 39

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Generative Model (contd)

Sample the founder of haplotype i φci|DP(τ, Q0)

  • = {acj, θcj}ifci = cjfor somej < i

∼ Q(a, θ)ifci = cjfor allj < i Sample the haplotype according to its founder hi|ci ∼ P(·|aci, θci)

slide-40
SLIDE 40

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Generative Model (contd)

Sample the founder of haplotype i φci|DP(τ, Q0)

  • = {acj, θcj}ifci = cjfor somej < i

∼ Q(a, θ)ifci = cjfor allj < i Sample the haplotype according to its founder hi|ci ∼ P(·|aci, θci)

Assumes each haplotype originates from one ancestor

slide-41
SLIDE 41

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Generative Model (contd)

Sample the founder of haplotype i φci|DP(τ, Q0)

  • = {acj, θcj}ifci = cjfor somej < i

∼ Q(a, θ)ifci = cjfor allj < i Sample the haplotype according to its founder hi|ci ∼ P(·|aci, θci)

Assumes each haplotype originates from one ancestor

Valid only for short regions in chromosome

slide-42
SLIDE 42

Basics HMDP Inference Results HDPM Results

Dirichlet Process Mixtures (Contd.)

Generative Model (contd)

Sample the founder of haplotype i φci|DP(τ, Q0)

  • = {acj, θcj}ifci = cjfor somej < i

∼ Q(a, θ)ifci = cjfor allj < i Sample the haplotype according to its founder hi|ci ∼ P(·|aci, θci)

Assumes each haplotype originates from one ancestor

Valid only for short regions in chromosome Long regions will have recombination

slide-43
SLIDE 43

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM

slide-44
SLIDE 44

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space

slide-45
SLIDE 45

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support

slide-46
SLIDE 46

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns

slide-47
SLIDE 47

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns

Stock urn Q0 with balls of K colors, nk of color k

slide-48
SLIDE 48

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns

Stock urn Q0 with balls of K colors, nk of color k HMM-urns Q1, . . . , QK for prior and transition probabilities

slide-49
SLIDE 49

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns

Stock urn Q0 with balls of K colors, nk of color k HMM-urns Q1, . . . , QK for prior and transition probabilities Let mj,k be the number of balls of color k in urn Qj

slide-50
SLIDE 50

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns

Stock urn Q0 with balls of K colors, nk of color k HMM-urns Q1, . . . , QK for prior and transition probabilities Let mj,k be the number of balls of color k in urn Qj HDPM can be simulated by sampling from the urn hierarchy

slide-51
SLIDE 51

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process

Nonparametric Bayesian HMM Sample a DP to form the support of the infinite state space Conditioned on each state, sample a DP with the same support Hierarchical Urns

Stock urn Q0 with balls of K colors, nk of color k HMM-urns Q1, . . . , QK for prior and transition probabilities Let mj,k be the number of balls of color k in urn Qj HDPM can be simulated by sampling from the urn hierarchy

Hierarchical DPM Q0|α, F ∼ DP(α, F) Qj|τ, Q0 ∼ DP(τ, Q0)

slide-52
SLIDE 52

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process (Contd.)

Each color corresponds to ancestor configuration φk = {ak, θk}

slide-53
SLIDE 53

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process (Contd.)

Each color corresponds to ancestor configuration φk = {ak, θk} For n random draws from Q0 φn|φ−n ∼

K

  • k=1

nk n − 1 + αδφk(φn) + α n − 1 + αF(φn)

slide-54
SLIDE 54

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process (Contd.)

Each color corresponds to ancestor configuration φk = {ak, θk} For n random draws from Q0 φn|φ−n ∼

K

  • k=1

nk n − 1 + αδφk(φn) + α n − 1 + αF(φn) Conditioned on Q0, the marginal configs from Qj φmj|φ−mj ∼

  • k

mj,k + τ

nk n−1+α

mj − 1 + tau + τ mj − 1 + τ α n − 1 + αF(φmj)

slide-55
SLIDE 55

Basics HMDP Inference Results HDPM Results

Hidden Markov Dirichlet Process (Contd.)

Each color corresponds to ancestor configuration φk = {ak, θk} For n random draws from Q0 φn|φ−n ∼

K

  • k=1

nk n − 1 + αδφk(φn) + α n − 1 + αF(φn) Conditioned on Q0, the marginal configs from Qj φmj|φ−mj ∼

  • k

mj,k + τ

nk n−1+α

mj − 1 + tau + τ mj − 1 + τ α n − 1 + αF(φmj)

slide-56
SLIDE 56

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance

Priors for the conditional model parameters F(A, θ) = p(A)p(θ)

slide-57
SLIDE 57

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance

Priors for the conditional model parameters F(A, θ) = p(A)p(θ) p(A) is assumed uniform, p(θ) is assumed beta

slide-58
SLIDE 58

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance

Priors for the conditional model parameters F(A, θ) = p(A)p(θ) p(A) is assumed uniform, p(θ) is assumed beta Ci = [Ci,1, . . . , Ci,T] ancestral index for chromosome i

slide-59
SLIDE 59

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance

Priors for the conditional model parameters F(A, θ) = p(A)p(θ) p(A) is assumed uniform, p(θ) is assumed beta Ci = [Ci,1, . . . , Ci,T] ancestral index for chromosome i With no recombination, Ci,t = k, ∀t for some k

slide-60
SLIDE 60

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance

Priors for the conditional model parameters F(A, θ) = p(A)p(θ) p(A) is assumed uniform, p(θ) is assumed beta Ci = [Ci,1, . . . , Ci,T] ancestral index for chromosome i With no recombination, Ci,t = k, ∀t for some k Non-recombination is modeled by Poisson point process P(Ci,t+1 = Ci,t = k) = exp(−dr) + (1 − exp(−dr))πkk

slide-61
SLIDE 61

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance

Priors for the conditional model parameters F(A, θ) = p(A)p(θ) p(A) is assumed uniform, p(θ) is assumed beta Ci = [Ci,1, . . . , Ci,T] ancestral index for chromosome i With no recombination, Ci,t = k, ∀t for some k Non-recombination is modeled by Poisson point process P(Ci,t+1 = Ci,t = k) = exp(−dr) + (1 − exp(−dr))πkk

d is the distance between the two loci

slide-62
SLIDE 62

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance

Priors for the conditional model parameters F(A, θ) = p(A)p(θ) p(A) is assumed uniform, p(θ) is assumed beta Ci = [Ci,1, . . . , Ci,T] ancestral index for chromosome i With no recombination, Ci,t = k, ∀t for some k Non-recombination is modeled by Poisson point process P(Ci,t+1 = Ci,t = k) = exp(−dr) + (1 − exp(−dr))πkk

d is the distance between the two loci r is the rate of recombination per unit distance

slide-63
SLIDE 63

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance

Priors for the conditional model parameters F(A, θ) = p(A)p(θ) p(A) is assumed uniform, p(θ) is assumed beta Ci = [Ci,1, . . . , Ci,T] ancestral index for chromosome i With no recombination, Ci,t = k, ∀t for some k Non-recombination is modeled by Poisson point process P(Ci,t+1 = Ci,t = k) = exp(−dr) + (1 − exp(−dr))πkk

d is the distance between the two loci r is the rate of recombination per unit distance

The transition probability to state k′ is P(Ci,t = k, Ci,t+1 = k′) = (1 − exp(dr))πkk′

slide-64
SLIDE 64

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance (Contd.)

Hi is a mosaic of multiple ancestral chromosomes

slide-65
SLIDE 65

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance (Contd.)

Hi is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM

slide-66
SLIDE 66

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance (Contd.)

Hi is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM With r → ∞, we get stationary HMM

slide-67
SLIDE 67

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance (Contd.)

Hi is a mosaic of multiple ancestral chromosomes Model is a time-inhomogenous infinite HMM With r → ∞, we get stationary HMM Single locus mutation model for emission p(ht|at, θ) = θI(ht=at) 1 − θ |B| − 1 I(ht=at)

slide-68
SLIDE 68

Basics HMDP Inference Results HDPM Results

Haplotype Recombination and Inheritance

slide-69
SLIDE 69

Basics HMDP Inference Results HDPM Results

HMDP for Recombination and Inheritance (Contd.)

Conditional probability of haplotype list h p(h|c, a) =

  • k
  • θk
  • i,t|ci,t=k

p(hi,t|ak,t, θk)Beta(θk|αh, βh)dθk =

  • k

Γ(αh + βh) Γ(αh)Γ(βh) Γ(αh + ℓk)Γ(βh + ℓ′

k)

Γ(αh + βh + ℓk + ℓ′

k)

  • 1

|B| − 1 ℓ′

k

where ℓk =

  • i,t

I(hi,t = ak,t)I(ci,t = k) ℓ′

k =

  • i,t

I(hi,t = ak,t)I(ci,t = k)

slide-70
SLIDE 70

Basics HMDP Inference Results HDPM Results

Inference

Gibbs sampler proceeds in two steps

slide-71
SLIDE 71

Basics HMDP Inference Results HDPM Results

Inference

Gibbs sampler proceeds in two steps

Sample inheritance {Ci,k} given h and a

slide-72
SLIDE 72

Basics HMDP Inference Results HDPM Results

Inference

Gibbs sampler proceeds in two steps

Sample inheritance {Ci,k} given h and a Sample ancestors a = {a1, . . . , aK} given h, C

slide-73
SLIDE 73

Basics HMDP Inference Results HDPM Results

Inference

Gibbs sampler proceeds in two steps

Sample inheritance {Ci,k} given h and a Sample ancestors a = {a1, . . . , aK} given h, C

Improve mixing for sampling inheritance

slide-74
SLIDE 74

Basics HMDP Inference Results HDPM Results

Inference

Gibbs sampler proceeds in two steps

Sample inheritance {Ci,k} given h and a Sample ancestors a = {a1, . . . , aK} given h, C

Improve mixing for sampling inheritance

By Bayes rule p(ct+1 : t + δ|c−, h, a) ∝

t+δ

  • j=t

p(cj+1|cj, m, n)

t+δ

  • j=t+1

p(hj|acj,j, ℓcj)

slide-75
SLIDE 75

Basics HMDP Inference Results HDPM Results

Inference

Gibbs sampler proceeds in two steps

Sample inheritance {Ci,k} given h and a Sample ancestors a = {a1, . . . , aK} given h, C

Improve mixing for sampling inheritance

By Bayes rule p(ct+1 : t + δ|c−, h, a) ∝

t+δ

  • j=t

p(cj+1|cj, m, n)

t+δ

  • j=t+1

p(hj|acj,j, ℓcj) Assume probability of having two recombinations is small p(ct+1 : t + δ|c−, h, a) ∝ p(ct′|ct′−1, m, n)p(ct+δ+1|ct+δ = ct′, m, n)

t

  • j=
slide-76
SLIDE 76

Basics HMDP Inference Results HDPM Results

Inference (Contd.)

Assuming d, r to be small, λ = 1 − exp(−dr) ≈ dr p(ct′ = k|ct′−1 = k, m, n, r, d) =

  • λπk,k′ + (1 − λ)δ(k, k′)fork′ ∈ {1

λπk,K+1 fork′ = K + 1

slide-77
SLIDE 77

Basics HMDP Inference Results HDPM Results

Inference (Contd.)

Assuming d, r to be small, λ = 1 − exp(−dr) ≈ dr p(ct′ = k|ct′−1 = k, m, n, r, d) =

  • λπk,k′ + (1 − λ)δ(k, k′)fork′ ∈ {1

λπk,K+1 fork′ = K + 1 Terms can be replaced in original equation to get sampler

slide-78
SLIDE 78

Basics HMDP Inference Results HDPM Results

Inference (Contd.)

Assuming d, r to be small, λ = 1 − exp(−dr) ≈ dr p(ct′ = k|ct′−1 = k, m, n, r, d) =

  • λπk,k′ + (1 − λ)δ(k, k′)fork′ ∈ {1

λπk,K+1 fork′ = K + 1 Terms can be replaced in original equation to get sampler Posterior distribution for ancestors p(ak,t|c, h) ∝ Γ(αh + βh) Γ(αh)Γ(βh) Γ(αh + ℓk,t)Γ(βh + ℓ′

k,t)

Γ(αh + βh + ℓk,t + ℓ′

k,t)

  • 1

|B| − 1 ℓ′

k,t

slide-79
SLIDE 79

Basics HMDP Inference Results HDPM Results

Single Population Data

Haplotype block boundaries HMDP (black solid), HMM (red dotted), MDL (blue dashed)

slide-80
SLIDE 80

Basics HMDP Inference Results HDPM Results

Two Population Data

slide-81
SLIDE 81

Basics HMDP Inference Results HDPM Results

Hierarchical DPM for Haplotype Inference

slide-82
SLIDE 82

Basics HMDP Inference Results HDPM Results

Hierarchical DPM for Haplotype Inference (Contd.)

slide-83
SLIDE 83

Basics HMDP Inference Results HDPM Results

Experiments: Hapmap Data

SNP genotypes from four populations

slide-84
SLIDE 84

Basics HMDP Inference Results HDPM Results

Experiments: Hapmap Data

SNP genotypes from four populations

CEPH, Utah residents with northern/weatern European ancestry, 60

slide-85
SLIDE 85

Basics HMDP Inference Results HDPM Results

Experiments: Hapmap Data

SNP genotypes from four populations

CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60

slide-86
SLIDE 86

Basics HMDP Inference Results HDPM Results

Experiments: Hapmap Data

SNP genotypes from four populations

CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45

slide-87
SLIDE 87

Basics HMDP Inference Results HDPM Results

Experiments: Hapmap Data

SNP genotypes from four populations

CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45 JPT, Japanese in Tokyo, 44

slide-88
SLIDE 88

Basics HMDP Inference Results HDPM Results

Experiments: Hapmap Data

SNP genotypes from four populations

CEPH, Utah residents with northern/weatern European ancestry, 60 YRI, Yoruba in Ibadan, Nigeria, 60 CHB, Han Chinese in Beijing, 45 JPT, Japanese in Tokyo, 44

Experiments on short (∼ 10) and long (∼ 102 − 103) SNPs

slide-89
SLIDE 89

Basics HMDP Inference Results HDPM Results

Short SNP Sequences

slide-90
SLIDE 90

Basics HMDP Inference Results HDPM Results

Long SNP Sequences

slide-91
SLIDE 91

Basics HMDP Inference Results HDPM Results

Mutation Rates and Diversity

slide-92
SLIDE 92

Basics HMDP Inference Results HDPM Results

Mutation Rates and Diversity (Contd.)