SLIDE 1 Tutorial on quasi-Monte Carlo methods
Josef Dick
School of Mathematics and Statistics, UNSW, Sydney, Australia josef.dick@unsw.edu.au
SLIDE 2 Comparison: MCMC, MC, QMC
Roughly speaking:
- Markov chain Monte Carlo and quasi-Monte Carlo are for
different types of problems;
- If you have a problem where Monte Carlo does not work, then
chances are quasi-Monte Carlo will not work as well;
- If Monte Carlo works, but you want a faster method ⇒ try
(randomized) quasi-Monte Carlo (some tweaking might be necessary).
- Quasi-Monte Carlo is an "experimental design" approach to
Monte Carlo simulation; In this talk we shall discuss how quasi-Monte Carlo can be faster than Monte Carlo under certain assumptions.
SLIDE 3
Outline DISCREPANCY, REPRODUCING KERNEL HILBERT
SPACES AND WORST-CASE ERROR
QUASI-MONTE CARLO POINT SETS RANDOMIZATIONS WEIGHTED FUNCTION SPACES AND
TRACTABILITY
SLIDE 4 The task is to approximate an integral Is(f) =
for some integrand f by some quadrature rule QN,s(f) = 1 N
N−1
f(xn) at some sample points x0, . . . , xN−1 ∈ [0, 1]s.
SLIDE 5
N
N−1
f(xn) In other words: Area under curve = Volume of cube × average function value.
- If x0, . . . , xN−1 ∈ [0, 1]s are chosen randomly ⇒ Monte Carlo
- If x0, . . . , xN−1 ∈ [0, 1]s chosen deterministically ⇒ Quasi-Monte
Carlo
SLIDE 6 Smooth integrands
- Integrand f : [0, 1] → R; say continuously differentiable;
- We want to study the integration error:
1 f(x) dx − 1 N
N−1
f(xn).
f(x) = f(1) − 1
x
f ′(t) dt = f(1) − 1 1[0,t](x)f ′(t) dt, where 1[0,t](x) =
if x ∈ [0, t],
SLIDE 7 Integration error
1 f(x)dx = 1
1 1[0,t](x)f ′(t) dt
= f(1) − 1 1 1[0,t](x)f ′(t) dx dt, 1 N
N−1
f(xn) = 1 N
N−1
1 1[0,t](xn)f ′(t) dt
1 1 N
N−1
1[0,t](xn)f ′(t) dt.
1 f(x) dx − 1 N
N−1
f(xn) = 1
N
N−1
1[0,t](xn) − 1 1[0,t](x) dx
SLIDE 8 Local discrepancy
Integration error: 1 f(x) dx − 1 N
N−1
f(xn) = 1
N
N−1
1[0,t](xn) − t
Let P = {x0, . . . , xN−1} ⊂ [0, 1]. Define the local discrepancy by ∆P(t) = 1 N
N−1
1[0,t](xn) − t, t ∈ [0, 1]. Then 1 f(x) dx − 1 N
N−1
f(xn) = 1 ∆P(t)f ′(t) dt.
SLIDE 9 Koksma-Hlawka inequality
f(x) dx − 1 N
N−1
f(xn)
∆P(t)f ′(t) dt
1 |∆P(t)|p dt 1/p 1 |f ′(t)|q dt 1/q = ∆PLpf ′Lq, 1 p + 1 q = 1, where gLp =
SLIDE 10 Interpretation of discrepancy
Let P = {x0, . . . , xN−1} ⊂ [0, 1]. Recall the definition of the local discrepancy: ∆P(t) = 1 N
N−1
1[0,t](xn) − t, t ∈ [0, 1]. Local discrepancy measures difference between uniform distribution and emperical distribution of quadrature points P = {x0, . . . , xN−1}. This is the Kolmogorov-Smirnov test for the difference between the empirical distribution of {x0, . . . , xN−1} and the uniform distribution.
SLIDE 11 Function space
Representation f(x) = f(1) − 1
x
f ′(t) dt. Define inner product: f, g = f(1)g(1) + 1 f ′(t)g′(t) dt. and norm f =
1 |f ′(t)|2 dt. Function space: H = {f : [0, 1] → R : f absolutely continuous and f < ∞}.
SLIDE 12 Worst-case error
The worst-case error is defined by e(H, P) = sup
f∈H,f≤1
f(x) dx − 1 N
N−1
f(xn)
SLIDE 13 Reproducing kernel Hilbert space
Recall: f(y) = f(1) · 1 + 1 f ′(x)(−1[0,x](y)) dx and f, g = f(1)g(1) + 1 f ′(x)g′(x) dx. Goal: Find set of functions gy ∈ H for each y ∈ [0, 1] such that f, gy = f(y) for all f ∈ H. Conclusions:
- gy(1) = 1 for all y ∈ [0, 1];
- g′
y(x) = −1[0,x](y) =
−1 if y ≤ x,
- therwise;
- Make g continuous such that g ∈ H;
SLIDE 14
Reproducing kernel Hilbert space
It follows that gy(x) = 1 + min{1 − x, 1 − y}. The function K(x, y) := gy(x) = 1 + min{1 − x, 1 − y}, x, y ∈ [0, 1] is called reproducing kernel. The space H is called a reproducing kernel Hilbert space (with reproducing kernel K).
SLIDE 15 Numerical integration in reproducing kernel Hilbert spaces
Function representation: 1 f(z) dz = 1 f, K(·, z) dz =
1 K(·, z) dz
N
N−1
f(xn) = 1 N
N−1
f, K(·, xn) =
N
N−1
K(·, xn)
1 f(z) dz − 1 N
N−1
f(xn) =
1 K(·, z) dz − 1 N
N−1
K(·, xn)
where h(x) = 1 K(x, z) dz − 1 N
N−1
K(x, xn).
SLIDE 16 Worst-case error in reproducing kernel Hilbert spaces
Thus e(H, P) = sup
f∈H,f≤1
f(z) dz − 1 N
N−1
f(xn)
sup
f∈H,f=1
|f, h| = sup
f∈H,f=0
f, h
h = h, since the supremum is attained when choosing f =
h h ∈ H.
SLIDE 17 Worst-case error in reproducing kernel Hilbert spaces
e2(H, P) = 1 1 K(x, y) dx dy − 2 N
N−1
1 K(x, xn) dx + 1 N2
N−1
K(xn, xm)
SLIDE 18 Numerical integration in higher dimensions
- Tensor product space Hs = H × · · · × H.
- Reproducing kernel
K(x, y) =
s
[1 + min{1 − xi, 1 − yi}], where x = (x1, . . . , xs), y = (y1, . . . , ys) ∈ [0, 1]s.
- Functions f ∈ Hs have partial mixed derivatives up to order 1 in
each variable ∂|u|f ∂xu ∈ L2([0, 1]s), where u ⊆ {1, . . . , s}, xu = (xi)i∈u and |u| denotes the cardinality
∂xu (xu, 1) = 0 for all u ⊂ {1, . . . , s}.
SLIDE 19 Worst-case error
Again e2(H, P) =
- [0,1]s
- [0,1]s K(x, y) dx dy − 2
N
N−1
+ 1 N2
N−1
K(xn, xm) and e2(H, P) =
∂x (x) dx, where ∆P(x) = 1 N
N−1
1[0,x](xn) −
s
xi.
SLIDE 20 Discrepancy in higher dimensions
Point set P = {x0, . . . , xN−1} ⊂ [0, 1]s, t = (t1, . . . , ts) ∈ [0, 1]s.
∆P(t) = 1 N
N−1
1[0,t](xn) −
s
ti, where [0, t] = s
i=1[0, ti].
SLIDE 21 Koksma-Hlawka inequality
Let f : [0, 1]s → R with f =
∂x f(x)
dx 1/q , and where ∂|u|f
∂xu (xu, 1) = 0 for all u ⊂ {1, . . . , s}.
Then
N
N−1
f(xn)
where 1
p + 1 q = 1.
⇒ Construct points P = {x0, . . . , xN−1} ⊂ [0, 1]s with small discrepancy Lp(P) = ∆PLp. It is often useful to consider different reproducing kernels yielding different worst-case errors and discrepancies. We will see further examples later.
SLIDE 22 Constructions of low-discrepancy sequences
- Lattice rules (Bilyk, Brauchart, Cools, D., Hellekalek, Hickernell,
Hlawka, Joe, Keller, Korobov, Kritzer, Kuo, Larcher, L ’Ecuyer, Lemieux, Leobacher, Niederreiter, Nuyens, Pillichshammer, Sinescu, Sloan, Temlyakov, Wang, Wo´ zniakowski, ...)
- Digital nets and sequences (Baldeaux, Bierbrauer, Brauchart,
Chen, D., Edel, Faure, Hellekalek, Hofer, Keller, Kritzer, Kuo, Larcher, Leobacher, Niederreiter, Owen, Özbudak, Pillichshammer, Pirsic, Schmid, Skriganov, Sobol’, Wang, Xing, Yue, ...)
- Hammersley-Halton sequences (Atanassov, De Clerck, Faure,
Halton, Hammersley, Kritzer, Larcher, Lemieux, Pillichshammer, Pirsic, White, ...)
- Kronecker sequences (Beck, Hellekalek, Larcher, Niederreiter,
Schoissengeier, ...)
SLIDE 23 Lattice rules
Let N ∈ N, let g = (g1, . . . , gs) ∈ {1, . . . , N − 1}s. Choose the quadrature points as xn = ng N
for n = 0, . . . , N − 1, where {z} = z − ⌊z⌋ for z ∈ R+
0 .
SLIDE 24
Fibonacci lattice rules
Lattice rule with N = 55 points and generating vector g = (1, 34).
SLIDE 25
Fibonacci lattice rules
Lattice rule with N = 89 points and generating vector g = (1, 55).
SLIDE 26
In comparison: Random point set
Random set of 64 points generated by Matlab using Mersenne Twister.
SLIDE 27 Lattice rules
How to find generating vector g? Reproducing kernel: K(x, y) =
s
(1 + 2πBα ({xi − yi})) , where {xi − yi} = (xi − yi) − ⌊xi − yi⌋ is the fractional part of xi − yi. Reproducing kernel Hilbert space of Fourier series: f(x) =
SLIDE 28 Worst-case error: e2
α(g)
= −1 + 1 N
N−1
s
ngi N
where Bα is the Bernoulli polynomial of order α. For instance B2(x) = x2 − x + 1/6.
SLIDE 29 Component-by-component construction (Korobov, Sloan-Reztsov,Sloan-Kuo-Joe, Nuyens-Cools)
- Set g1 = 1.
- For d = 2, . . . , s assume that we have found g2, . . . , gd−1. Then
find gd ∈ {1, . . . , N − 1} which minimizes eα(g1, . . . , gd−1, g) as a function of g, i.e. gd = argmin
g∈{1,...,N−1}e2 α(g1, . . . , gd−1, g).
Using fast Fourier transform (Nuyens, Cools), a good generating vector g ∈ {1, . . . , N − 1}s can be found in O(sN log N) operations.
SLIDE 30 Hammersley-Halton sequence
Radical inverse function in base b:
- Let n ∈ N0 have base b expansion
n = n0 + n1b + · · · + na−1ba−1.
ϕb(n) = n0 b + n1 b2 + · · · + na−1 ba ∈ [0, 1]. Hammersley-Halton sequence:
- Let P = {p1, p2, . . . , } be the set of prime numbers in increasing
- rder, i.e. p1 = 2, p2 = 3, p3 = 5, p4 = 7, . . ..
- Define quadrature points x0, x1, . . . by
xn = (ϕp1(n), ϕp2(n), . . . , ϕps(n)) for n = 0, 1, 2, . . . .
SLIDE 31
Hammersley-Halton sequence
Hammerlsey-Halton point set with 64 points.
SLIDE 32 Digital nets
- Choose prime number b and finite field Zb = {0, 1, . . . , b − 1} of
- rder b.
- Choose C1, . . . , Cs ∈ Zm×m
b
.
- Let n = n0 + n1b + · · · + nm−1bm−1 and set
- n = (n0, . . . , nm−1)⊤ ∈ Zm
b .
n for 1 ≤ i ≤ s, 0 ≤ n < bm.
yn,i = (yn,i,1, . . . , yn,i,m)⊤ ∈ Zm
b let
xn,i = yn,i,1 b + · · · + yn,i,m bm .
- Set xn = (xn,1, . . . , xn,s) for 0 ≤ n < bm.
SLIDE 33 (t, m, s)-net property
Let m, s ≥ 1 and b ≥ 2 be integers. A point set P = {x0, . . . , xbm−1} is called a (t, m, s)-net in base b, if for all integers d1, . . . , ds ≥ 0 with d1 + · · · + ds = m − t the number of points in the elementary intervals
s
ai bdi , ai + 1 bdi
is bt.
SLIDE 34 If P = {x0, . . . , xbm−1} ⊂ [0, 1]s is a (t, m, s)-net, then local discrepancy function ∆P a1 bd1 , . . . , as bds
for all 0 ≤ ai < bdi, di ≥ 0, d1 + · · · + ds = m − t.
SLIDE 35
SLIDE 36
SLIDE 37
SLIDE 38
SLIDE 39
SLIDE 40
SLIDE 41
SLIDE 42
SLIDE 43
SLIDE 44
SLIDE 45
SLIDE 46
SLIDE 47
Sobol point set
The first 64 points of a Sobol sequence which form a (0, 6, 2)-net in base 2.
SLIDE 48
Niederreiter-Xing point set
The first 64 points of a Niederreiter-Xing sequence.
SLIDE 49
Niederreiter-Xing point set
The first 1024 points of a Niederreiter-Xing sequence.
SLIDE 50
Kronecker sequence
Let α1, . . . , αs ∈ R be linearly independent over Q. Let xn = ({nα1}, . . . , {nαs}) for n = 0, 1, 2, . . . . For instance, one can choose αi = √pi where pi is the ith prime number.
SLIDE 51
Kronecker sequence
The first 64 points of a Kronecker sequence.
SLIDE 52 Discrepancy bounds
- It is known that for all the constructions above one has
Lp(P) ≤ Cs (log N)c(s) N , for all N ≥ 1, 1 ≤ p ≤ ∞, where Cs > 0 and c(s) ≈ s.
- Lower bound: For all point sets P consisting of N points we have
Lp(P) ≥ C′
s
(log N)(s−1)/2 N for all N ≥ 1, 1 < p ≤ ∞. In comparison: random point set E(L2(P)) = O
N
SLIDE 53 For 1 < p < ∞, the exact order of convergence is known to be of
(log N)(s−1)/2 N . Explicit constructions of such points were achieved by Chen & Skriganov for p = 2 and Skriganov for 1 < p < ∞. Great open problem: What is the correct order of convergence of min
P⊂[0,1]s |P|=N
L∞(P)?
SLIDE 54 Randomized quasi-Monte Carlo
Introduce random element to deterministic point set. Has several advantages:
- yields an unbiased estimator;
- there is a statistical error estimate;
- better rate of convergence of the random-case error compared to
worst-case error;
- performs similar to Monte Carlo for L2 functions but has better
rate of convergence for smooth functions;
SLIDE 55 Randomly shifted lattice rules
Lattice rules can be randomized using a random shift:
{ng/N} , n = 0, 1, . . . , N − 1.
- Shifted lattice rule: choose σ ∈ [0, 1]s uniformly distributed; then
the shifted lattice point set is given by {ng/N + σ} , n = 0, 1, . . . , N − 1.
SLIDE 56
Lattice point set
SLIDE 57
Shifted lattice point set
SLIDE 58 Owen’s scrambling
- Let Zb = {0, 1, . . . , b − 1} and let
x = x1 b + x2 b2 + · · · , x1, x2, . . . ∈ Zb.
- Randomly choose permutations σ, σx1, σx1,x2, . . . : Zb → Zb.
- Let
y1 = σ(x1) y2 = σx1(x2) y3 = σx1,x2(x3) . . . . . .
y = y1 b + y2 b2 + · · · .
SLIDE 59
Scrambled Sobol point set
The first 1024 points of a Sobol sequence (left) and a scrambled Sobol sequence (right).
SLIDE 60 Scrambled net variance
Let e(R) =
N
bm−1
f(yn). Then
- E(e(R)) = 0
- Var(e(R)) = O
(log N)s N2α+1
- for integrands with smoothness 0 ≤ α ≤ 1.
SLIDE 61 Dependence on the dimension
Although the convergence rate is better for quasi-Monte Carlo, there is a stronger dependence on the dimension:
- Monte Carlo: error = O(N−1/2);
- Quasi-Monte Carlo: error = O
- (log N)s−1
N
Notice that g(N) := N−1(log N)s−1 is an increasing function for N ≤ es−1. So if s is large, say s ≈ 30, then N−1(log N)29 increases for N ≤ 1012.
SLIDE 62 Intractable
For integration error in H with reproducing kernel K(x, y) =
s
min{1 − xi, 1 − yi} it is known that e2(H, P) ≥
8 9 s e2(H, ∅). ⇒ Error can only decrease if N > constant · 9 8 s .
SLIDE 63 Weighted function spaces (Sloan and Wo´ zniakowski)
Study weighted function spaces: Introduce γ1, γ2, . . . , > 0 and define K(x, y) =
s
(1 + γi min{1 − xi, 1 − yi}). Then if
∞
γi < ∞ we have e(Hs,γ, P) ≤ CN−δ, where C > 0 is independent of the dimension s and δ < 1.
SLIDE 64 Tractability
- Minimal error over all methods using N points:
e∗
N(H) =
inf
P:|P|=N e(H, P);
N∗(s, ε) = min{N ∈ N : e∗
N(H) ≤ εe∗ 0(H)}
for ε > 0;
N∗(s, ε) ≤ Cε−β for some constant C > 0; Sloan and Wo´ zniakowski: For the function space considered above, we get strong tractability if
∞
γi < ∞.
SLIDE 65 Tractability of star-discrepancy
Recall the local discrepancy function ∆P(t) = 1 N
N−1
1[0,t)(xn) − t1 · · · ts, where P = {x0, . . . , xN−1}. The star-discrepancy is given by D∗
N(P) = sup t∈[0,1]s |∆P(t)| .
Result by Heinrich, Novak, Wo´ zniakowski, Wasilkowski: Minimal star discrepancy D∗(s, N) = inf
P D∗ N(P) ≤ C
N for all s, N ∈ N. This implies that N∗(s, ε) ≤ Csε−2.
SLIDE 66 Extensions, open problems and future research
- Quasi-Monte Carlo methods which achieve convergence rates of
- rder
N−α(log N)αs, α > 1 for sufficiently smooth integrands;
- Construction of point sets whose star-discrepancy is tractable;
- Completely uniformly distributed point sets to speed up
convergence in Markov chain Monte Carlo algorithms;
- Infinite dimensional integration: Integrands have infinitely many
variables;
- Connections to codes, orthogonal arrays and experimental
designs;
- Quasi-Monte Carlo for function approximation;
- Choosing the weights in applications;
- Quasi-Monte Carlo on the sphere;
SLIDE 67
Book
SLIDE 68 Thank You!
Special thanks to
- Dirk Nuyens for creating many of the pictures in this talk;
- Friedrich Pillichshammer for letting me use a picture of his
daughters;
- Fred Hickernell, Peter Kritzer, Art Owen, and Friedrich
Pillichshammer for helpful comments on the slides;
- All colleagues who worked on quasi-Monte Carlo methods over
the years.