Stochastic Simulation Variance reduction methods
Bo Friis Nielsen
Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk
Stochastic Simulation Variance reduction methods Bo Friis Nielsen - - PowerPoint PPT Presentation
Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk Variance reduction methods Variance reduction methods
Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk
02443 – lecture 7 2
DTU
same ressources
⋄ Antithetic variables ⋄ Control variates ⋄ Stratified sampling ⋄ Importance sampling ⋄ Common random numbers
02443 – lecture 7 3
DTU
Consider the integral 1 exdx We can interpret this interval as E
= 1 exdx = θ U ∈ U(0, 1) To estimate the integral: sample of the random variable eU and take the average. Xi = eUi ¯ X = n
i=1 Xi
n This is the crude Monte Carlo estimator, “crude” because we use no refinements whatsoever.
02443 – lecture 7 4
DTU
It is straightforward to calculate the integral in this case 1 exdx = e − 1 ≈ 1.72 The estimator X E(X) = e − 1 Var(X) = E(X2) − E(X)2 E(X2) = 1 (ex)2 dx = 1 2
Var(X) = 1 2
02443 – lecture 7 5
DTU
General idea: to exploit correlation
function): Use 1 − U also Yi = eUi + e1−Ui 2 = eUi +
e eUi
2 ¯ Y = n
i=0 Yi
n
Y should be similar to the effort needed to compute ¯ X. ⋄ By the latter expression of Yi we can generate the same number of Y ’s as X’s
02443 – lecture 7 6
DTU
We can analyse the example analytically due to its simplicity E( ¯ Y ) = E( ¯ X) = θ To calculate Var( ¯ Y ) we start with Var(Yi).
Var(Yi) = 1
4Var
+ 1 4Var
+ 2 · 1 4Cov
= 1 2Var
+ 1 2Cov
Cov
= E
− E
E
= e − (e − 1)2 = 3e − e2 − 1 = −0.2342 Var(Yi) = 1 2(0.2420 − 0.2342) = 0.0039
02443 – lecture 7 7
DTU
Crude method: Var(Xi) = 1 2
Antithetic method: Var(Yi) = 1 2(0.2420 − 0.2342) = 0.0039 I.e, a reduction by 98 % , almost for free. The variance on ¯ X - and ¯ Y - will scale with 1/n, the number of samples. Going from crude to antithetic method, reduces the variance as much as increasing number of samples with a factor 50.
02443 – lecture 7 8
DTU
If X = h(U1, . . . , Un) where h is monotone in each of its coordinates, then we can use antithetic variables Y = h(1 − U1, . . . , 1 − Un) to reduce the variance, because Cov(X, Y ) ≤ 0 and therefore Var( 1
2(X + Y )) ≤ 1 2Var(X).
02443 – lecture 7 9
DTU
Can you device the queueing model of yesterday, so that the number of rejections is a monotone function of the underlying Ui’s? Yes: Make sure that we always use either Ui or 1 − Ui, so that a large Ui implies customers arriving quickly and remaining long.
02443 – lecture 7 10
DTU
Use of covariates Z = X + c(Y − µy) E(Y ) = µy( known) Var(Z) = Var(X) + c2Var(Y ) + 2cCov(Y, X) We can minimize Var(Z) by choosing c = −Cov(X, Y ) Var(Y ) to get Var(Z) = Var(X) − Cov(X, Y )2 Var(Y )
02443 – lecture 7 11
DTU
Use U as control variate Zi = Xi + c
2
The optimal value can be found by Cov(X, Y ) = Cov
= E
− E(U)E
≈ 0.14086 In practice we would not know this covariance, but estimate it empirically. Var(Zc= −0.14086
1/12 ) = Var
− Cov
2 Var (U) = 0.0039
02443 – lecture 7 12
DTU
This is a general survey technique: We sample in predetermined areas, using knowledge of structure of the sampling space Wi = e
Ui,1 10 + e 1 10 + Ui,2 10 + · · · + e 9 10 + Ui,10 10
10 What is an appropriate number of strata? (In this case there is a simple answer; for complex problems not so)
02443 – lecture 7 13
DTU
Suppose we want to evaluate θ = E(h(X)) =
For g(x) > 0 whenever f(x) > 0 this is equivalent to θ = h(x)f(x) g(x) g(x)dx = E h(Y )f(Y ) g(Y )
This is an efficient estimator of θ, if we have chosen g such that the variance of
g(Y )
Such a g will lead to more Y ’s where h(y) is large. More important regions will be sampled more often.
02443 – lecture 7 14
DTU
We want to compare two different queueing systems. We can estimate the rejection rate of system i = 1, 2 by θi = E(gi(U1, . . . , Un)) and then rate the two systems according to ˆ θ2 − ˆ θ1 But typically g1(· · · ) and g2(· · · ) are positively correlated: Long service times imply many rejections.
02443 – lecture 7 15
DTU
Then a more efficient estimator is based on θ2 − θ1 = E (g2(U1, . . . , Un) − g1(U1, . . . , Un)) This amounts to letting the two systems run with the same input sequence of random numbers, i.e. same arrival and service time for each customer. With some program flows, this is easily obtained by re-setting the seed of the RNG. When this is not sufficient, you must store the sequence of arrival and service times, so they can be re-used. Note: In the slides there is no gain, as we make only one run!
1
0 exdx by simulation (the crude Monte Carlo
estimator). Use eg. an estimator based on 100 samples and present the result as the point estimator and a confidence interval.
1
0 exdx using antithetic variables, with
comparable computer ressources.
1
0 exdx using a control variable, with
comparable computer ressources.
1
0 exdx using stratified sampling, with
comparable computer ressources.
exercise 4 (Poisson arrivals).
exercise 4 for the difference between Poisson arrivals (Part 1) and a renewal process with hyperexponential interarrival times. Remark: You might need to some thinking and some re-programming.