Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen - - PowerPoint PPT Presentation

stochastic simulation markov chain monte carlo
SMART_READER_LITE
LIVE PREVIEW

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen - - PowerPoint PPT Presentation

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk The queueing example The queueing example We simulated the


slide-1
SLIDE 1

Stochastic Simulation Markov Chain Monte Carlo

Bo Friis Nielsen

Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk

slide-2
SLIDE 2

02443 – lecture 8 2

DTU

The queueing example The queueing example

We simulated the system until “stochastic steady state”. We were then able to describe this steady state:

  • What is the distribution of occupied servers
  • What is the rejection probability

The model was a “state machine”, i.e. a Markov Chain. To obtain steady-state statistics, we used stochastic simulation, i.e. Monte Carlo.

slide-3
SLIDE 3

02443 – lecture 8 3

DTU

Discrete time Markov chains Discrete time Markov chains

  • We observe a sequence of Xns taking values in some sample

space

  • The Next value in the sequence Xn+1 is determined from some

decision rule depending on the value of Xn only.

  • For discrete sample space we can express the decision rule as a

matrix of transition probabilities P = {pij}, pij = P(Xn+1 = j|Xn = i)

  • Under some technical assumptions we can find a stationary and

limiting distribution π.πj = P(X∞ = j).

  • This distribution can be analytically found by solving

π = πP (equilibrium distribution)

slide-4
SLIDE 4

02443 – lecture 8 4

DTU

Markov chains continued Markov chains continued

  • The theory can be extended to:

⋄ Continuous sample space or ⋄ Continuous time: exercise 4 is an example of a Continuous time Markov chain

slide-5
SLIDE 5

02443 – lecture 8 5

DTU

The probability of Xn The probability of Xn

  • The behaviour of the process itself - Xn
  • The behaviour conditional on X0 = i is (pij(n))
  • Define P(Xn = j) = µ(n)

j

with P(X0 = j) = µ(0)

j

  • with

µ(n) = {µ(n)

j } we find

  • µ(n) =

µ(n−1)P = µ(0)Pn = µ(0)P n

slide-6
SLIDE 6

02443 – lecture 8 6

DTU

Small example Small example

P =        1 − p p q p q p q 1 − q        with µ(0) = 1

3, 0, 0, 2 3

  • we get
  • µ(1) =

1 3, 0, 0, 2 3

      1 − p p q p q p q 1 − q        = 1 − p 3 , p 3, 2q 3 , 2(1 − q) 3

slide-7
SLIDE 7

02443 – lecture 8 7

DTU

and and

  • µ(0) =

1 3, 0, 0, 2 3

  • ,

P 2 =        (1 − p)2 + pq (1 − p)p p2 q(1 − p) 2qp p2 q2 2qp p(1 − q) q2 (1 − q)q (1 − q)2 + qp       

slide-8
SLIDE 8

02443 – lecture 8 8

DTU

  • µ(2) =

1 3, 0, 0, 2 3

  • ·

       (1 − p)2 + pq (1 − p)p p2 q(1 − p) 2qp p2 q2 2qp p(1 − q) q2 (1 − q)q (1 − q)2 + qp        = (1 − p)2 + pq 3 , (1 − p)p 3 , 4qp 3 , 2p(1 − q) 3

slide-9
SLIDE 9

02443 – lecture 8 9

DTU

MCMC: What we aim to achieve MCMC: What we aim to achieve

We have a variable X with a “complicated” distribution. We cannot sample X directly. We aim to generate a sequence of Xi’s

  • which each has the same distribution as X
  • but we allow them to be interdependent.

This is an inverse problem relative to the queueing exercise: We start with the distribution of X, and aim to design a state machine which has this steady-state distribution.

slide-10
SLIDE 10

02443 – lecture 8 10

DTU

MCMC example from Bayesian statistics MCMC example from Bayesian statistics

Prior distribution of parameter P ∼ U(0, 1) : fP(p) = 1 (0 ≤ p ≤ 1) Distribution of data, conditional on parameter X for given P = p is Binomial(n, p) i.e. the data has the conditional probabilities P(X = i|P) =   n i   P i(1 − P)n−i

slide-11
SLIDE 11

02443 – lecture 8 11

DTU

The posterior distribution of P The posterior distribution of P

Conditional density of parameter, given observed data X = i: fP|X=i(p) = fP(p)P(X = i|P = p) P(X = i) We need the unconditional probability of the observation: P(X = i) = 1 fP(p)   n i   pi(1 − p)n−i dp We can evaluate this; in more complex models we could not. AIM: To sample from fP|X=i, without evaluating P(X = i).

slide-12
SLIDE 12

02443 – lecture 8 12

DTU

The posterior distribution The posterior distribution

20 40 60 80 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Time series

Iteration no. P

Histogram of Pvec

Pvec Frequency 0.2 0.4 0.6 0.8 200 400 600 800 1000 1200 1400

slide-13
SLIDE 13

02443 – lecture 8 13

DTU

When to apply MCMC? When to apply MCMC?

The distribution is given by f(x) = c · g(x) where the unnormalized density g can be evaluated, but the normalising constant c cannot be evaluated (easily). c = 1

  • X g(x) dx

This is frequently the case in Bayesian statistics - the posterior density is proportional to the likelihood function Note (again) the similarity between simulation and evaluation of integrals

slide-14
SLIDE 14

02443 – lecture 8 14

DTU

Metropolis-Hastings algorithm Metropolis-Hastings algorithm

  • Proposal distribution h(x, y)
  • Acceptance of solution? The solution will be accepted with

probability min

  • 1, f(y)h(y, x)

f(x)h(x, y)

  • = min
  • 1, g(y)h(y, x)

g(x)h(x, y)

  • = min
  • 1, g(y)

g(x)

  • for h(y, x) = h(x, y)
  • Avoiding the troublesome constant K!
  • Frequently we apply a symmetric proposal distribution

h(y, x) = h(y, x) Metropolis algorithm

  • It can be shown that this Markov chain will have f(x) as

stationary distribution.

slide-15
SLIDE 15

02443 – lecture 8 15

DTU

Random Walk Metropolis-Hastings Random Walk Metropolis-Hastings

Sampling from p.d.f. c · g(x) where c is unknown.

  • 1. At iteration i, the state is Xi
  • 2. Propose to jump from Xi to Yi = Xi + ∆Xi where ∆Xi is

sampled indepedently from a symmetric distribution

  • If g(Y ) ≥ g(Xi), accept
  • If g(Y ) ≤ g(Xi), accept w.p. g(Y )/g(Xi)
  • 3. On accept: Set Xi+1 = Yi and goto 1.
  • 4. On reject: Set Xi+1 = Xi and goto 1.

Note that knowing c is not necessary!

slide-16
SLIDE 16

02443 – lecture 8 16

DTU

Proposal distribution (Gelman 1998) Proposal distribution (Gelman 1998)

  • A good proposal distribution has the following properties

⋄ For any x, it is easy to sample from h(x, y) ⋄ It is easy to compute the accpetance probability ⋄ Each jump goes a reasonable distance in the parameter space ⋄ The proposals are not rejected too frequently

slide-17
SLIDE 17

02443 – lecture 8 17

DTU

Gibss sampling Gibss sampling

  • Applies in multivariate cases where the conditional distribution

among the coordinates are known.

  • For a multidimensional distribution x the Gibss sampler will

modify only one coordinate at a time.

  • Typically d-steps in each iteration, where d is the dimension of

the parameter space , that is of x

slide-18
SLIDE 18

02443 – lecture 8 18

DTU

Illustration of ordinary and MCMC sampling Illustration of ordinary and MCMC sampling

slide-19
SLIDE 19

02443 – lecture 8 19

DTU

Gibbs sampling - first dimension Gibbs sampling - first dimension

slide-20
SLIDE 20

02443 – lecture 8 20

DTU

Gibbs sampling - second dimension Gibbs sampling - second dimension

slide-21
SLIDE 21

02443 – lecture 8 21

DTU

Direct Markov chain as oppposed to MCMC Direct Markov chain as oppposed to MCMC

  • For an ordinary Markov chain we know P and find π -

analytically or by simulation

  • When we apply MCMC

⋄ For a discrete distribution we know Kπ construct P which has no physical interpretation in general and obtain π by simulation ⋄ For a continuous distribution we know g(x) construct a transition kernel P(x, y) and get f(x) by simulation.

slide-22
SLIDE 22

02443 – lecture 8 22

DTU

Remarks Remarks

  • The method is computer intensitive
  • It is hard to verify the assumptions (Read: impossible)
  • Warmup period strongly recommended (necessary indeed!)
  • The samples are correlated
  • Should be run several times with different starting conditions

⋄ Comparing within run variance with between run variance

  • Check the BUGS site:

http://www.mrc-bsu.cam.ac.uk/bugs/and/or links given at the BUGS site

slide-23
SLIDE 23

02443 – lecture 8 23

DTU

Further reading Further reading

  • A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin: Bayesian Data

Analysis, Chapmann & Hall 1998, ISBN 0 412 03991 5

  • W.R. Gilks, S. Richarson, D.J. Spiegelhalter: Markov chain Mone

Carlo in practice, Chapmann & Hall 1996, ISBN 0 412 05551 1

slide-24
SLIDE 24

Beyond Random Walk Metropolis-Hastings Beyond Random Walk Metropolis-Hastings

  • Proposed points Yi can be generated with other schemes - this

would change the acceptance probabilities.

  • In mulitvariate situations, we can process one co-ordinate at a

time

  • If we know conditional distributions in the mulitvariate setting,

then we can apply Gibbs sampling

  • This is well suited for graphical models with many variables,

which each interact only with a few others

  • (Decision support systems is a big area of application)
  • Many hybrids and specialized versions exist
  • Very active research area, both theory and applications
slide-25
SLIDE 25

Exercise 6: Markov Chain Monte Carlo simulation Exercise 6: Markov Chain Monte Carlo simulation

  • The number of busy lines in a trunk group (Erlang system) is

given by a truncated Poisson distribution P(i) =

Ai i!

n

j=0 Aj j!

  • Generate values from this distribution by applying the

Metropolis-Hastings algorithm, verify with a χ2-test. You can use the parameter values from exercise 4.

slide-26
SLIDE 26

Exercise 6 continued Exercise 6 continued

  • For two different call types the joint number of occupied lines is

given by P(i, j) = 1 K Ai

1

i! Aj

2

j! 0 ≤ i + j ≤ n

  • Use Metropolis-Hastings, directly and coordinate wise to

generate variates from this distribution. You can use A1, A2 = 4

  • g n = 10.
  • Test the distribution with a χ2 test
  • Redo the coordinate wise solution using Gibbs sampling. You will

need to find the conditional distributions analytically.

  • The system can be extended to an arbitrary dimension, and we

can add restrictions on the different call types.