Stochastic Simulation Variance reduction methods Bo Friis Nielsen - - PowerPoint PPT Presentation

stochastic simulation variance reduction methods
SMART_READER_LITE
LIVE PREVIEW

Stochastic Simulation Variance reduction methods Bo Friis Nielsen - - PowerPoint PPT Presentation

Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk Variance reduction methods Variance reduction methods


slide-1
SLIDE 1

Stochastic Simulation Variance reduction methods

Bo Friis Nielsen

Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk

slide-2
SLIDE 2

02443 – lecture 7 2

DTU

Variance reduction methods Variance reduction methods

  • To obtain better estimates (tigher confidence intervals) with the

same ressources

  • Exploit analytical knowledge and/or correlation
  • Methods:

⋄ Antithetic variables ⋄ Control variates ⋄ Stratified sampling ⋄ Importance sampling ⋄ Common random numbers

slide-3
SLIDE 3

02443 – lecture 7 3

DTU

Case: Monte Carlo evaluation of integral Case: Monte Carlo evaluation of integral

Consider the integral 1 exdx We can interpret this interval as E

  • eU

= 1 exdx = θ U ∈ U(0, 1) To estimate the integral: sample of the random variable eU and take the average. Xi = eUi ¯ X = n

i=1 Xi

n This is the crude Monte Carlo estimator, “crude” because we use no refinements whatsoever.

slide-4
SLIDE 4

02443 – lecture 7 4

DTU

Analytical considerations Analytical considerations

It is straightforward to calculate the integral in this case 1 exdx = e − 1 ≈ 1.72 The estimator X E(X) = e − 1 Var(X) = E(X2) − E(X)2 E(X2) = 1 (ex)2 dx = 1 2

  • e2 − 1
  • Based on one observation

Var(X) = 1 2

  • e2 − 1
  • − (e − 1)2 = 0.2420
slide-5
SLIDE 5

02443 – lecture 7 5

DTU

Antithetic variables Antithetic variables

General idea: to exploit correlation

  • If the estimator is positively correlated with Ui (monotone

function): Use 1 − U also Yi = eUi + e1−Ui 2 = eUi +

e eUi

2 ¯ Y = n

i=0 Yi

n

  • The computational effort of calculating ¯

Y should be similar to the effort needed to compute ¯ X. ⋄ By the latter expression of Yi we can generate the same number of Y ’s as X’s

slide-6
SLIDE 6

02443 – lecture 7 6

DTU

Antithetic variables - analytical Antithetic variables - analytical

We can analyse the example analytically due to its simplicity E( ¯ Y ) = E( ¯ X) = θ To calculate Var( ¯ Y ) we start with Var(Yi).

Var(Yi) = 1

4Var

  • eUi

+ 1 4Var

  • e1−Ui

+ 2 · 1 4Cov

  • eUi, e1−Ui

= 1 2Var

  • eUi

+ 1 2Cov

  • eUi(e1−Ui

Cov

  • eUi, e1−Ui

= E

  • eUie1−Ui

− E

  • eUi

E

  • e1−Ui

= e − (e − 1)2 = 3e − e2 − 1 = −0.2342 Var(Yi) = 1 2(0.2420 − 0.2342) = 0.0039

slide-7
SLIDE 7

02443 – lecture 7 7

DTU

Comparison: Crude method vs. antithetic Comparison: Crude method vs. antithetic

Crude method: Var(Xi) = 1 2

  • e2 − 1
  • − (e − 1)2 = 0.2420

Antithetic method: Var(Yi) = 1 2(0.2420 − 0.2342) = 0.0039 I.e, a reduction by 98 % , almost for free. The variance on ¯ X - and ¯ Y - will scale with 1/n, the number of samples. Going from crude to antithetic method, reduces the variance as much as increasing number of samples with a factor 50.

slide-8
SLIDE 8

02443 – lecture 7 8

DTU

Antethetic variables in more complex models Antethetic variables in more complex models

If X = h(U1, . . . , Un) where h is monotone in each of its coordinates, then we can use antithetic variables Y = h(1 − U1, . . . , 1 − Un) to reduce the variance, because Cov(X, Y ) ≤ 0 and therefore Var( 1

2(X + Y )) ≤ 1 2Var(X).

slide-9
SLIDE 9

02443 – lecture 7 9

DTU

Antithetic variables in the queue simulation Antithetic variables in the queue simulation

Can you device the queueing model of yesterday, so that the number of rejections is a monotone function of the underlying Ui’s? Yes: Make sure that we always use either Ui or 1 − Ui, so that a large Ui implies customers arriving quickly and remaining long.

slide-10
SLIDE 10

02443 – lecture 7 10

DTU

Control variates Control variates

Use of covariates Z = X + c(Y − µy) E(Y ) = µy( known) Var(Z) = Var(X) + c2Var(Y ) + 2cCov(Y, X) We can minimize Var(Z) by choosing c = −Cov(X, Y ) Var(Y ) to get Var(Z) = Var(X) − Cov(X, Y )2 Var(Y )

slide-11
SLIDE 11

02443 – lecture 7 11

DTU

Example Example

Use U as control variate Zi = Xi + c

  • Ui − 1

2

  • Xi = eUi

The optimal value can be found by Cov(X, Y ) = Cov

  • U, eU

= E

  • UeU

− E(U)E

  • eU

≈ 0.14086 In practice we would not know this covariance, but estimate it empirically. Var(Zc= −0.14086

1/12 ) = Var

  • eU

− Cov

  • eU, U

2 Var (U) = 0.0039

slide-12
SLIDE 12

02443 – lecture 7 12

DTU

Stratified sampling Stratified sampling

This is a general survey technique: We sample in predetermined areas, using knowledge of structure of the sampling space Wi = e

Ui,1 10 + e 1 10 + Ui,2 10 + · · · + e 9 10 + Ui,10 10

10 What is an appropriate number of strata? (In this case there is a simple answer; for complex problems not so)

slide-13
SLIDE 13

02443 – lecture 7 13

DTU

Importance sampling Importance sampling

Suppose we want to evaluate θ = E(h(X)) =

  • h(x)f(x)dx

For g(x) > 0 whenever f(x) > 0 this is equivalent to θ = h(x)f(x) g(x) g(x)dx = E h(Y )f(Y ) g(Y )

  • where Y is distributed with density g(y)

This is an efficient estimator of θ, if we have chosen g such that the variance of

  • h(Y )f(Y )

g(Y )

  • is small.

Such a g will lead to more Y ’s where h(y) is large. More important regions will be sampled more often.

slide-14
SLIDE 14

02443 – lecture 7 14

DTU

Re-using the random numbers Re-using the random numbers

We want to compare two different queueing systems. We can estimate the rejection rate of system i = 1, 2 by θi = E(gi(U1, . . . , Un)) and then rate the two systems according to ˆ θ2 − ˆ θ1 But typically g1(· · · ) and g2(· · · ) are positively correlated: Long service times imply many rejections.

slide-15
SLIDE 15

02443 – lecture 7 15

DTU

Then a more efficient estimator is based on θ2 − θ1 = E (g2(U1, . . . , Un) − g1(U1, . . . , Un)) This amounts to letting the two systems run with the same input sequence of random numbers, i.e. same arrival and service time for each customer. With some program flows, this is easily obtained by re-setting the seed of the RNG. When this is not sufficient, you must store the sequence of arrival and service times, so they can be re-used. Note: In the slides there is no gain, as we make only one run!

slide-16
SLIDE 16

Exercise 5: Variance reduction methods Exercise 5: Variance reduction methods

  • 1. Estimate the integral

1

0 exdx by simulation (the crude Monte Carlo

estimator). Use eg. an estimator based on 100 samples and present the result as the point estimator and a confidence interval.

  • 2. Estimate the integral

1

0 exdx using antithetic variables, with

comparable computer ressources.

  • 3. Estimate the integral

1

0 exdx using a control variable, with

comparable computer ressources.

  • 4. Estimate the integral

1

0 exdx using stratified sampling, with

comparable computer ressources.

  • 5. Use control variates to reduce the variance of the estimator in

exercise 4 (Poisson arrivals).

  • 6. Demonstrate the effect of using common random numbers in

exercise 4 for the difference between Poisson arrivals (Part 1) and a renewal process with hyperexponential interarrival times. Remark: You might need to some thinking and some re-programming.