[PPT] - Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen PowerPoint Presentation

SLIDE 1

Stochastic Simulation Markov Chain Monte Carlo

Bo Friis Nielsen

Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk

SLIDE 2

02443 – lecture 8 2

DTU

The queueing example The queueing example

We simulated the system until “stochastic steady state”. We were then able to describe this steady state:

What is the distribution of occupied servers
What is the rejection probability

The model was a “state machine”, i.e. a Markov Chain. To obtain steady-state statistics, we used stochastic simulation, i.e. Monte Carlo.

SLIDE 3

02443 – lecture 8 3

DTU

Discrete time Markov chains Discrete time Markov chains

We observe a sequence of Xns taking values in some sample

space

The Next value in the sequence Xn+1 is determined from some

decision rule depending on the value of Xn only.

For discrete sample space we can express the decision rule as a

matrix of transition probabilities P = {pij}, pij = P(Xn+1 = j|Xn = i)

Under some technical assumptions we can find a stationary and

limiting distribution π.πj = P(X∞ = j).

This distribution can be analytically found by solving

π = πP (equilibrium distribution)

SLIDE 4

02443 – lecture 8 4

DTU

Markov chains continued Markov chains continued

The theory can be extended to:

⋄ Continuous sample space or ⋄ Continuous time: exercise 4 is an example of a Continuous time Markov chain

SLIDE 5

02443 – lecture 8 5

DTU

The probability of Xn The probability of Xn

The behaviour of the process itself - Xn
The behaviour conditional on X0 = i is (pij(n))
Define P(Xn = j) = µ(n)

j

with P(X0 = j) = µ(0)

j

with

µ(n) = {µ(n)

j } we find

µ(n) =

µ(n−1)P = µ(0)Pn = µ(0)P n

SLIDE 6

02443 – lecture 8 6

DTU

Small example Small example

P =        1 − p p q p q p q 1 − q        with µ(0) = 1

3, 0, 0, 2 3

we get
µ(1) =

1 3, 0, 0, 2 3



      1 − p p q p q p q 1 − q        = 1 − p 3 , p 3, 2q 3 , 2(1 − q) 3

SLIDE 7

02443 – lecture 8 7

DTU

and and

µ(0) =

1 3, 0, 0, 2 3

,

P 2 =        (1 − p)2 + pq (1 − p)p p2 q(1 − p) 2qp p2 q2 2qp p(1 − q) q2 (1 − q)q (1 − q)2 + qp       

SLIDE 8

02443 – lecture 8 8

DTU

µ(2) =

1 3, 0, 0, 2 3

·

       (1 − p)2 + pq (1 − p)p p2 q(1 − p) 2qp p2 q2 2qp p(1 − q) q2 (1 − q)q (1 − q)2 + qp        = (1 − p)2 + pq 3 , (1 − p)p 3 , 4qp 3 , 2p(1 − q) 3

SLIDE 9

02443 – lecture 8 9

DTU

MCMC: What we aim to achieve MCMC: What we aim to achieve

We have a variable X with a “complicated” distribution. We cannot sample X directly. We aim to generate a sequence of Xi’s

which each has the same distribution as X
but we allow them to be interdependent.

This is an inverse problem relative to the queueing exercise: We start with the distribution of X, and aim to design a state machine which has this steady-state distribution.

SLIDE 10

02443 – lecture 8 10

DTU

MCMC example from Bayesian statistics MCMC example from Bayesian statistics

Prior distribution of parameter P ∼ U(0, 1) : fP(p) = 1 (0 ≤ p ≤ 1) Distribution of data, conditional on parameter X for given P = p is Binomial(n, p) i.e. the data has the conditional probabilities P(X = i|P) =   n i   P i(1 − P)n−i

SLIDE 11

02443 – lecture 8 11

DTU

The posterior distribution of P The posterior distribution of P

Conditional density of parameter, given observed data X = i: fP|X=i(p) = fP(p)P(X = i|P = p) P(X = i) We need the unconditional probability of the observation: P(X = i) = 1 fP(p)   n i   pi(1 − p)n−i dp We can evaluate this; in more complex models we could not. AIM: To sample from fP|X=i, without evaluating P(X = i).

SLIDE 12

02443 – lecture 8 12

DTU

The posterior distribution The posterior distribution

20 40 60 80 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Time series

Iteration no. P

Histogram of Pvec

Pvec Frequency 0.2 0.4 0.6 0.8 200 400 600 800 1000 1200 1400

SLIDE 13

02443 – lecture 8 13

DTU

When to apply MCMC? When to apply MCMC?

The distribution is given by f(x) = c · g(x) where the unnormalized density g can be evaluated, but the normalising constant c cannot be evaluated (easily). c = 1

X g(x) dx

This is frequently the case in Bayesian statistics - the posterior density is proportional to the likelihood function Note (again) the similarity between simulation and evaluation of integrals

SLIDE 14

02443 – lecture 8 14

DTU

Metropolis-Hastings algorithm Metropolis-Hastings algorithm

Proposal distribution h(x, y)
Acceptance of solution? The solution will be accepted with

probability min

1, f(y)h(y, x)

f(x)h(x, y)

= min
1, g(y)h(y, x)

g(x)h(x, y)

= min
1, g(y)

g(x)

for h(y, x) = h(x, y)
Avoiding the troublesome constant K!
Frequently we apply a symmetric proposal distribution

h(y, x) = h(y, x) Metropolis algorithm

It can be shown that this Markov chain will have f(x) as

stationary distribution.

SLIDE 15

02443 – lecture 8 15

DTU

Random Walk Metropolis-Hastings Random Walk Metropolis-Hastings

Sampling from p.d.f. c · g(x) where c is unknown.

1. At iteration i, the state is Xi
2. Propose to jump from Xi to Yi = Xi + ∆Xi where ∆Xi is

sampled indepedently from a symmetric distribution

If g(Y ) ≥ g(Xi), accept
If g(Y ) ≤ g(Xi), accept w.p. g(Y )/g(Xi)
3. On accept: Set Xi+1 = Yi and goto 1.
4. On reject: Set Xi+1 = Xi and goto 1.

Note that knowing c is not necessary!

SLIDE 16

02443 – lecture 8 16

DTU

Proposal distribution (Gelman 1998) Proposal distribution (Gelman 1998)

A good proposal distribution has the following properties

⋄ For any x, it is easy to sample from h(x, y) ⋄ It is easy to compute the accpetance probability ⋄ Each jump goes a reasonable distance in the parameter space ⋄ The proposals are not rejected too frequently

SLIDE 17

02443 – lecture 8 17

DTU

Gibss sampling Gibss sampling

Applies in multivariate cases where the conditional distribution

among the coordinates are known.

For a multidimensional distribution x the Gibss sampler will

modify only one coordinate at a time.

Typically d-steps in each iteration, where d is the dimension of

the parameter space , that is of x

SLIDE 18

02443 – lecture 8 18

DTU

Illustration of ordinary and MCMC sampling Illustration of ordinary and MCMC sampling

SLIDE 19

02443 – lecture 8 19

DTU

Gibbs sampling - first dimension Gibbs sampling - first dimension

SLIDE 20

02443 – lecture 8 20

DTU

Gibbs sampling - second dimension Gibbs sampling - second dimension

SLIDE 21

02443 – lecture 8 21

DTU

Direct Markov chain as oppposed to MCMC Direct Markov chain as oppposed to MCMC

For an ordinary Markov chain we know P and find π -

analytically or by simulation

When we apply MCMC

⋄ For a discrete distribution we know Kπ construct P which has no physical interpretation in general and obtain π by simulation ⋄ For a continuous distribution we know g(x) construct a transition kernel P(x, y) and get f(x) by simulation.

SLIDE 22

02443 – lecture 8 22

DTU

Remarks Remarks

The method is computer intensitive
It is hard to verify the assumptions (Read: impossible)
Warmup period strongly recommended (necessary indeed!)
The samples are correlated
Should be run several times with different starting conditions

⋄ Comparing within run variance with between run variance

Check the BUGS site:

http://www.mrc-bsu.cam.ac.uk/bugs/and/or links given at the BUGS site

SLIDE 23

02443 – lecture 8 23

DTU

Beyond Random Walk Metropolis-Hastings Beyond Random Walk Metropolis-Hastings

Proposed points Yi can be generated with other schemes - this

would change the acceptance probabilities.

In mulitvariate situations, we can process one co-ordinate at a

time

If we know conditional distributions in the mulitvariate setting,

then we can apply Gibbs sampling

This is well suited for graphical models with many variables,

which each interact only with a few others

(Decision support systems is a big area of application)
Many hybrids and specialized versions exist
Very active research area, both theory and applications

SLIDE 25

Exercise 6: Markov Chain Monte Carlo simulation Exercise 6: Markov Chain Monte Carlo simulation

The number of busy lines in a trunk group (Erlang system) is

given by a truncated Poisson distribution P(i) =

Ai i!

n

j=0 Aj j!

Generate values from this distribution by applying the

Metropolis-Hastings algorithm, verify with a χ2-test. You can use the parameter values from exercise 4.

SLIDE 26

Exercise 6 continued Exercise 6 continued

For two different call types the joint number of occupied lines is

given by P(i, j) = 1 K Ai

1

i! Aj

2

j! 0 ≤ i + j ≤ n

Use Metropolis-Hastings, directly and coordinate wise to

generate variates from this distribution. You can use A1, A2 = 4

g n = 10.
Test the distribution with a χ2 test
Redo the coordinate wise solution using Gibbs sampling. You will

need to find the conditional distributions analytically.

The system can be extended to an arbitrary dimension, and we

can add restrictions on the different call types.

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen - - PowerPoint PPT Presentation

Stochastic Simulation Markov Chain Monte Carlo

Bo Friis Nielsen

The queueing example The queueing example

Discrete time Markov chains Discrete time Markov chains

Markov chains continued Markov chains continued

The probability of Xn The probability of Xn

Small example Small example

and and

MCMC: What we aim to achieve MCMC: What we aim to achieve

MCMC example from Bayesian statistics MCMC example from Bayesian statistics

The posterior distribution of P The posterior distribution of P

The posterior distribution The posterior distribution

When to apply MCMC? When to apply MCMC?

Metropolis-Hastings algorithm Metropolis-Hastings algorithm

Random Walk Metropolis-Hastings Random Walk Metropolis-Hastings

Proposal distribution (Gelman 1998) Proposal distribution (Gelman 1998)

Gibss sampling Gibss sampling

Illustration of ordinary and MCMC sampling Illustration of ordinary and MCMC sampling

Gibbs sampling - first dimension Gibbs sampling - first dimension

Gibbs sampling - second dimension Gibbs sampling - second dimension

Direct Markov chain as oppposed to MCMC Direct Markov chain as oppposed to MCMC

Remarks Remarks

Further reading Further reading

Beyond Random Walk Metropolis-Hastings Beyond Random Walk Metropolis-Hastings

Exercise 6: Markov Chain Monte Carlo simulation Exercise 6: Markov Chain Monte Carlo simulation

Exercise 6 continued Exercise 6 continued