Continued fractions and number systems: applications to - - PowerPoint PPT Presentation

continued fractions and number systems applications to
SMART_READER_LITE
LIVE PREVIEW

Continued fractions and number systems: applications to - - PowerPoint PPT Presentation

Continued fractions and number systems: applications to correctly-rounded implementations of elementary functions and modular arithmetic. Mourad Gouicem PEQUAN Team, LIP6/UPMC Nancy, France May 28 th 2013 Table of Contents Continued fraction


slide-1
SLIDE 1

Continued fractions and number systems: applications to correctly-rounded implementations

  • f elementary functions and modular arithmetic.

Mourad Gouicem

PEQUAN Team, LIP6/UPMC

Nancy, France May 28th 2013

slide-2
SLIDE 2

Table of Contents

1

Continued fraction expansion reminders

2

Application to correctly-rounded implementations of elementary functions

3

Application to modular arithmetic

  • M. Gouicem

Continued fractions and Number systems applications 1 / 37

slide-3
SLIDE 3

Table of Contents

1

Continued fraction expansion reminders

2

Application to correctly-rounded implementations of elementary functions

3

Application to modular arithmetic

  • M. Gouicem

Continued fractions and Number systems applications 2 / 37

slide-4
SLIDE 4

Notations

Let a real 0 < α < 1. There exists a unique integer sequence (ki)i∈N with ki ∈ N∗ such that α = 1 k1 + 1 k2 + 1 ... := [0; k1, k2, . . . ]. This sequence is finite if α is rational, or infinite otherwise.

  • M. Gouicem

Continued fractions and Number systems applications 3 / 37

slide-5
SLIDE 5

Notations

We denote by : (ri)i∈N the real sequence of the tails of α such that α = [0; k1, k2, . . . , ki + ri] ; (pi/qi)i∈N the rational sequence of the convergents such that pi/qi = [0; k1, k2, . . . , ki] ; (θi)i∈N the real sequence of the such that θi = |qiα − pi| ;

  • M. Gouicem

Continued fractions and Number systems applications 4 / 37

slide-6
SLIDE 6

Notations

We denote by : (ri)i∈N the real sequence of the tails of α such that α = [0; k1, k2, . . . , ki + ri] ; (pi/qi)i∈N the rational sequence of the convergents such that pi/qi = [0; k1, k2, . . . , ki] ; (θi)i∈N the real sequence of the such that θi = |qiα − pi| ; Leitmotif of the talk Use the fact that ri = θi/θi−1 to do modular arithmetic !

  • M. Gouicem

Continued fractions and Number systems applications 4 / 37

slide-7
SLIDE 7

Recurrence relations

All sequences can be computed recursively : p−1 = 1 p0 = 0 pi = pi−2 + kipi−1, q−1 = 0 q0 = 1 qi = qi−2 + kiqi−1, θ−1 = 1 θ0 = α θi = θi−2 − kiθi−1. with ki = ⌊θi−2/θi−1⌋. ki can be computed by subtraction (subtraction-based Euclidean algorithm) or by division (division-based Euclidean algorithm).

  • M. Gouicem

Continued fractions and Number systems applications 5 / 37

slide-8
SLIDE 8

Table of Contents

1

Continued fraction expansion reminders

2

Application to correctly-rounded implementations of elementary functions

3

Application to modular arithmetic

  • M. Gouicem

Continued fractions and Number systems applications 6 / 37

slide-9
SLIDE 9

The IEEE 754-2008 standard

Aim Ensure predictable and portable numerical software. Basic Formats single-precision (binary32) double-precision (binary64) quadruple-precision (binary128) Rounding Modes Rounding to nearest Directed rounding (towards 0, −∞ and +∞) Correctly rounded operations +, −, ×, /, √

  • M. Gouicem

Continued fractions and Number systems applications 7 / 37

slide-10
SLIDE 10

The IEEE 754-2008 standard

And for elementary mathematical functions ? exp, log, sin, cos, tan, · · · ⇒ IEEE-754-2008 only recommends correct rounding because of the Table Maker’s Dilemma

  • M. Gouicem

Continued fractions and Number systems applications 8 / 37

slide-11
SLIDE 11

The Table Maker’s Dilemma

Correct rounding

  • p(f (x) ± ε) = ◦p(f (x))

Hard-to-round case

[f (x) − ε, f (x) + ε]

Midpoints Floating-points

  • M. Gouicem

Continued fractions and Number systems applications 9 / 37

slide-12
SLIDE 12

HR-case search by isolation

Function Isolate(Exists ?, P, D, depth, k) Input: Exists? (P, D) test the existence of (p, ǫ′) HR-cases of P in D, P an approximation of f in D, depth and k two integers if Exists? (P, D) then if depth = 0 then retourner ExhaustiveSearch (P, D); else (D1, . . . , Dk) := Subdivide (D, k); (P1, . . . , Pk) := Refine (P, D, k); return

i Isolate (Exists?, Pi, Di, depth - 1, k);

else return ∅;

  • M. Gouicem

Continued fractions and Number systems applications 10 / 37

slide-13
SLIDE 13

HR-case search by isolation

Function Isolate(Exists ?, P, D, depth, k) Input: Exists? (P, D) test the existence of (p, ǫ′) HR-cases of P in D, P an approximation of f in D, depth and k two integers if Exists? (P, D) then if depth = 0 then retourner ExhaustiveSearch (P, D); else (D1, . . . , Dk) := Subdivide (D, k); (P1, . . . , Pk) := Refine (P, D, k); return

i Isolate (Exists?, Pi, Di, depth - 1, k);

else return ∅;

  • M. Gouicem

Continued fractions and Number systems applications 10 / 37

slide-14
SLIDE 14

HR-case existence test

Problem Given P ∈ R[x], is there any x ∈ 0, #pD such that P(x) mod 1 < ǫ. Solutions (with p the floating-point precision) Exhaustive search : O(2p) ; Lef` evre when deg P = 1 : O(22p/3) intervals in O(p2) ; SLZ when deg P > 1 : O(2p/2) intervals in O(poly(p, deg P, α)). Example of computation times ex in full domain and p = 53 with Lef` evre : 5 years of CPU time ; 2x in [1/2, 1[ and p = 64 with SLZ : 8 years of CPU time.

  • M. Gouicem

Continued fractions and Number systems applications 11 / 37

slide-15
SLIDE 15

Challenges Binary128 is actually out of reach ; compute the hardest-to-round case for each of the 32 functions recommended by the IEEE std 754-2008 in binary64 ; tackle any function in reasonable time in binary64 ; and certify the results. Lef` evre HR-case search Efficient for binary64 in practice : all known hardest-to-round in binary64 have been generated by Lef` evre ; Massively parallel ; Fine-grain parallelism.

  • Perfect for GPU !
  • M. Gouicem

Continued fractions and Number systems applications 12 / 37

slide-16
SLIDE 16

Lef` evre HR-case existence test

Problem Given b − ax ∈ R[x], is there any x ∈ 0, #pD such that b − ax mod 1 < ǫ. Geometrically Is there any multiple of a in {ax mod 1 | x ≤ #pD} at a distance less than ǫ to the left of b ?

1 b a · 0 a · 1 a · 2 a · 3 a · 4 a · 5 a · 6 a · 7

  • M. Gouicem

Continued fractions and Number systems applications 13 / 37

slide-17
SLIDE 17

The three distance theorem

Three distance theorem (Slater) The points {a · x mod 1 | x < N} split the segment [0, 1[ into N

  • segments. Their lengths take at most three different values, one

being the sum of the two others. Link with continued fraction expansion Given a = [0; k1, k2, k3, . . . ] and pi

qi the ith convergent.

The lengths are the θi−1,t = θi−1 − t · θi, with 0 ≤ t < ki+1. Their number are the qi−1,t = qi−1 + t · qi, with 0 ≤ t < ki+1. There are O(log N) two-length configurations and they verify qiθi−1,t + qi−1,tθi = 1. Interpretation : if we place N = qi + qi−1,t multiples of a,

there are qi intervals of length θi−1,t ; there are qi−1,t intervals of length θi.

  • M. Gouicem

Continued fractions and Number systems applications 14 / 37

slide-18
SLIDE 18

Example : a = 14/45

45

(0, 0) 1

14 31

(0, 1) 1 2

14 14 17

(0, 2) 1 2 3

14 14 14 3

(1, 0) 1 2 3 4 5 6

11 3 11 3 11 3 3

(1, 1) 1 2 3 4 5 6 7 8 9

8 3 3 8 3 3 8 3 3 3

(1, 2) 1 2 3 4 5 6 7 8 9 10 11 12

5 3 3 3 5 3 3 3 5 3 3 3 3

(1, 3) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

2 3 3 3 3 2 3 3 3 3 2 3 3 3 3 3

(2, 0)

  • M. Gouicem

Continued fractions and Number systems applications 15 / 37

slide-19
SLIDE 19

Lef` evre HR-case existence test

Idea Write b in the basis (θi,t)i∈N to get best approximations. If i is even

b (i, 1) (i, 2) (i, 3) (i, 4) (i, 5) (i, 6) (i + 1, 0)

θi θi θi θi θi θi θi θi+1

(b − ˜ bi,t+1) = (b − ˜ bi,t) − θi

  • r

(b − ˜ bi+1,0) = (b − ˜ bi,t)

If i is odd

b (i, 1) (i, 2) (i, 3) (i, 4) (i, 5) (i, 6) (i + 1, 0)

θi θi θi θi θi θi θi θi+1

(b − ˜ bi+1,0) = (b − ˜ bi,t) − θi−1,t+1

  • r

(b − ˜ bi,t+1) = (b − ˜ bi,t)

  • M. Gouicem

Continued fractions and Number systems applications 16 / 37

slide-20
SLIDE 20

An irregular control flow : bad for SIMD

Algorithm 1: Lef` evre HR-case existence test.

input : b − a · x, ǫ′′, N initialisation : p ← {a} ; q ← 1 − {a} ; d ← {b} ; u ← 1 ; v ← 1 ; if d < ǫ′′ then return Failure; while True do if d < p then k = ⌊q/p⌋; q ← q − k ∗ p ; u ← u + k ∗ v; if u + v ≥ N then return Success; p ← p − q ; v ← v + u; else d ← d − p; if d < ǫ′′ then return Failure; k = ⌊p/q⌋; p ← p − k ∗ q ; v ← v + k ∗ u; if u + v ≥ N then return Success; q ← q − p ; u ← u + v;

  • M. Gouicem

Continued fractions and Number systems applications 17 / 37

slide-21
SLIDE 21

An irregular control flow : the SPMD on SIMD model

SPMD on SIMD Develop a scalar program : a kernel Launch multiple threads running the same kernel Group their execution on SIMD units by warps of 32 threads Control flow regularity i=1 i=1 i=1 i=1 switch (i) { i = 0 : . . . i = 1 : . . . i = 2 : . . . i = 3 : . . . } i=0 i=3 i=2 i=1

  • M. Gouicem

Continued fractions and Number systems applications 18 / 37

slide-22
SLIDE 22

An irregular control flow : loop divergence

Normalized mean deviation to the maximum (NMDM) 1 − Mean({ni, 0 ≤ i < w}) Max({ni, 0 ≤ i < w}) Lef` evre existence test (ex, [1, 1 + 2−13], ǫ = 2−32) No implementation trick works ! Re-organize data ⇒ no a priori information Compute several sub-domains per thread without exiting the loop ⇒ too few instructions to issue in the loop to offset the extra cost.

  • M. Gouicem

Continued fractions and Number systems applications 19 / 37

slide-23
SLIDE 23

An irregular control flow

Why Lef` evre HR-case existence test is irregular ? It goes from subtraction-based to division-based Euclidean algorithm depending on the position of b. ⇒ The number of loop iterations is hence conditioned by : the position of b on the unit segment, the number of quotient to compute and the value of the quotients. Goal Break this dependency by considering only (i, 0) configurations. ⇒ Write b in the basis (θi)i∈N to obtain the same sequence of best approximations.

  • M. Gouicem

Continued fractions and Number systems applications 20 / 37

slide-24
SLIDE 24

New reduction rules

If i is even

b (i, 1) (i, 2) (i, 3) (i, 4) (i, 5) (i, 6) (i + 1, 0)

θi θi θi θi θi θi θi θi+1

(b − ˜ bi+1) = (b − ˜ bi) mod θi If i is odd

b (i, 1) (i, 2) (i, 3) (i, 4) (i, 5) (i, 6) (i + 1, 0)

θi θi θi θi θi θi θi θi+1

(b − ˜ bi+1) = (b − ˜ bi) − θi+1 mod θi

  • M. Gouicem

Continued fractions and Number systems applications 21 / 37

slide-25
SLIDE 25

Divergence in the regular algorithm

Algorithm 2: Regular HR-case existence test.

input : b − a · x, ǫ′′, N initialisation : p ← {a} ; q ← 1 ; d ← {b} ; u ← 1 ; v ← 0 ; if d < ǫ′′ then return Failure; while True do if p < q then k = ⌊q/p⌋; q = q − k ∗ p ; u = u + k ∗ v; d = d mod p; else k = ⌊p/q⌋; p = p − k ∗ q ; v = v + k ∗ u; if d ≥ p then d = (d − p) mod q; if u + v ≥ N then return d > ǫ′′;

  • M. Gouicem

Continued fractions and Number systems applications 22 / 37

slide-26
SLIDE 26

Divergence on the main conditional branch

A deterministic test i is alternatively odd and even. ⇒ We can avoid divergence by unrolling 2 loop iterations. Algorithm 3: Regular HR-case existence test unrolled.

input : b − ax, ǫ′′, N initialisation : p ← {a} ; q ← 1 ; d ← {b} ; u ← 1 ; v ← 0 ; while True do k = ⌊q/p⌋; q = q − k ∗ p ; u = u + k ∗ v; d = d mod p; if u + v ≥ N then return d > ǫ′′; k = ⌊p/q⌋; p = p − k ∗ q ; v = v + k ∗ u; if d ≥ p then d = d − p mod q; if u + v ≥ N then return d > ǫ′′;

  • M. Gouicem

Continued fractions and Number systems applications 23 / 37

slide-27
SLIDE 27

Divergence on the main loop (ex, subdomain [1, 1 + 2−13[)

Normalized mean deviation to the maximum (NMDM) 1 − Mean({ni, 0 ≤ i < w}) Max({ni, 0 ≤ i < w}) Lef` evre Algorithm New Algorithm

  • M. Gouicem

Continued fractions and Number systems applications 24 / 37

slide-28
SLIDE 28

A regular control flow

Why the regular HR-case existence test is regular ? It uses only division-based Euclidean algorithm. ⇒ The number of loop iterations only depend on the number of quotient to compute, which is very stable from one interval to the next. Seq. MPI CPU-GPU

Seq. MPI MPI CPU-GPU

Pol. 43300.81 5251.53 788.84 8.25 6.66 approx. Lef` evre 36816.10 5292.67 2446.27 6.96 2.16 Regular 34039.94 4716.97 711.92 7.22 6.63

  • Lef. /Reg.

1.08 1.12 3.44 – –

Table : Performance result on ex in [1, 2[ for binary64 (Intel Xeon X5650 hexa-core , Nvidia C2070). Lef` evre MPI/New GPU : 7.4 .

  • M. Gouicem

Continued fractions and Number systems applications 25 / 37

slide-29
SLIDE 29

Perspectives

Remaining development Argument reduction of periodic functions for large binades Implicit vectorization (OpenCL, ispc, . . . ) Lef` evre HR-case existence test Consider minimax approximations (libsollya) rather than Taylor to widen domains ? Generalize Lef` evre test to higher degree polynomial (change of variable + Hensel lifting) ? SLZ Efficient parallel implementation of LLL Use structure of Coppersmith lattices Investigate structure in lattices involved in Coppersmith method over translated polynomials

  • M. Gouicem

Continued fractions and Number systems applications 26 / 37

slide-30
SLIDE 30

Table of Contents

1

Continued fraction expansion reminders

2

Application to correctly-rounded implementations of elementary functions

3

Application to modular arithmetic

  • M. Gouicem

Continued fractions and Number systems applications 27 / 37

slide-31
SLIDE 31

Modular arithmetic and CF

Notations (θi)i∈N sequence of the |qiα − pi| in CF, (θ′

i)i∈N sequence of the |qia − pid| in extgcd(a, d).

Remark CF is arithmetic modulo 1, CF over a rational a/d and gcd(a, d) is identical, only difference is θ′

i = d · θi.

Goal Use (θi)i∈N number scale to perform modular operations.

  • M. Gouicem

Continued fractions and Number systems applications 28 / 37

slide-32
SLIDE 32

Number systems based on CF

Ostrowski integer number system Given (qi)i∈N the denominators of the convergents of any irrational 0 < α < 1, every positive integer b can be uniquely written as b = 1 +

m

  • i=1

biqi−1 where 0 ≤ b1 ≤ k1 − 1, 0 ≤ bi ≤ ki, for i ≥ 2, bi = 0 if bi+1 = ki+1. Associated number scale over real numbers : ((−1)iθi)i∈N. Decomposition algorithm : greedy algorithm by default.

  • M. Gouicem

Continued fractions and Number systems applications 29 / 37

slide-33
SLIDE 33

Number systems based on CF

Signed Ostrowski integer number system Given (qi)i∈N the denominators of the convergents of any irrational 0 < α < 1, every positive integer b can be uniquely written as b = 1 +

m

  • i=1

(−1)ibiqi−1 where 0 ≤ b1 ≤ k1 − 1, 0 ≤ bi ≤ ki, for i ≥ 2, bi+1 = 0 if bi = ki. Associated number scale over real numbers : (θi)i∈N. Decomposition algorithm : greedy algorithm by excess.

  • M. Gouicem

Continued fractions and Number systems applications 30 / 37

slide-34
SLIDE 34

Modular multiplication

Compute c = a · b mod d. Algorithm

1 Compute the sequences (θ′

i)i∈N and (qi)i∈N from extgcd(a, d)

2 Compute the sequence (bi)i∈N such that b = 1 + m

i=1 biqi−1

3 Return a + m

i=1 bi(−1)iθ′ i−1

Proof : Let α = a/d and b = 1 + m

i=1 biqi−1.

α · b = α + m

i=1 biqi−1α

As (−1)iθi = qiα − pi, α · b = α + m

i=1 bi(−1)iθi−1 + m i=1 bipi−1

By the uniqueness of the decomposition, 0 ≤ α + m

i=1 bi(−1)iθi−1 < 1.

And finally, by multiplying by d, we get 0 ≤ a + m

i=1 bi(−1)iθ′ i−1 < d

  • M. Gouicem

Continued fractions and Number systems applications 31 / 37

slide-35
SLIDE 35

Modular division

Compute c = a−1 · b mod d. Algorithm

1 Compute the sequences (θ′

i)i∈N and (qi)i∈N from extgcd(a, d)

2 Compute the sequence (bi)i∈N such that

b = a + m

i=1 bi(−1)i−1θ′ i−1

3 Return 1 + m

i=1 biqi−1

Proof : Similar to modular multiplication.

  • M. Gouicem

Continued fractions and Number systems applications 32 / 37

slide-36
SLIDE 36

Complexity considerations

Compute the sequences (θ′

i)i∈N and (qi)i∈N from extgcd(a, d)

O(log(d)2) Compute the sequence (bi)i∈N from (qi)i∈N (or (θ′

i)i∈N)

O(log(d)2) Evaluate the sequence (bi)i∈N in (θ′

i)i∈N (or (qi)i∈N)

O(log(d)2)

  • M. Gouicem

Continued fractions and Number systems applications 33 / 37

slide-37
SLIDE 37

Implementation considerations

Both algorithm Integrate multiplication and reduction ⇒ we manipulate only words of size less than log d ; Quotients ki and bi can be computed only with subtractions as they are likely very small ⇒ mean value is Khinchin constant ≈ 2.69. Modular Multiplication (qi)i∈N is an increasing sequence ⇒ every qi ≤ b need to be stored to decompose b ⇒ the needed part of the sequence is of size O(log(b)2). Modular Division (θi)i∈N is an decreasing sequence ⇒ we can decompose b on-the-fly ⇒ no extra storage is needed !

  • M. Gouicem

Continued fractions and Number systems applications 34 / 37

slide-38
SLIDE 38

Perspectives

Algorithm enhancement Use other decompositions from Euclidean algorithm ?

compute remainders with centered division use decomposition from three distance theorem (irregular control flow ? optimal ?)

Find mean optimal decomposition (minimizing |bi| + |ki|) Use binary GCD algorithm to build the sequences (θi)i∈N and (qi)i∈N and avoid divisions ?

  • M. Gouicem

Continued fractions and Number systems applications 35 / 37

slide-39
SLIDE 39

Perspectives

Implementation We have a 64 bits proof of concept C implementation (speedup of 1.5 – 2.5x over GMP). Now provide multiprecision. Searching application... Compact hardware implementation of modular arithmetic ? (J´ er´ emie ?) When multiple modular mult/div by the same value a are needed (e.g. Gauss elimination) ?

  • M. Gouicem

Continued fractions and Number systems applications 36 / 37

slide-40
SLIDE 40

Thank you for your attention ! ! Any question, remark, recommendation ?

  • M. Gouicem

Continued fractions and Number systems applications 37 / 37

slide-41
SLIDE 41

References I

Val´ erie Berth´ e, Autour du syst` eme de num´ eration d’Ostrowski, Bulletin of the Belgian Mathematical Society 8 (2001), 209–238. Pierre Fortin, Mourad Gouicem, and Stef Graillat, Correctly rounding elementary functions on gpu, http ://arxiv.org/abs/1211.3056. Pierre Fortin, Mourad Gouicem, and Stef Graillat, Towards solving the table maker’s dilemma on GPU, Proceedings of the 20th International Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP’2012, IEEE Computer Society, February 2012, pp. 407 – 415. Mourad Gouicem, New modular multiplication and division algorithms based on continued fraction expansion, http ://arxiv.org/abs/1303.3445.

  • M. Gouicem

Continued fractions and Number systems applications 38 / 37

slide-42
SLIDE 42

References II

Jean-Michel Muller, Nicolas Brisebarre, Florent de Dinechin, Claude-Pierre Jeannerod, Vincent Lef` evre, Guillaume Melquiond, Nathalie Revol, Damien Stehl´ e, and Serge Torres, Handbook of floating-point arithmetic, Birkhauser, 2009. Noel Bryan Slater, Gaps and steps for the sequence nθ mod 1, Mathematical Proceedings of the Cambridge Philosophical Society (1967), 1115–1123.

  • M. Gouicem

Continued fractions and Number systems applications 39 / 37