On the Multiplicative Complexity of Boolean Functions and Bitsliced - - PowerPoint PPT Presentation

on the multiplicative complexity of boolean functions and
SMART_READER_LITE
LIVE PREVIEW

On the Multiplicative Complexity of Boolean Functions and Bitsliced - - PowerPoint PPT Presentation

On the Multiplicative Complexity of Boolean Functions and Bitsliced Higher-Order Masking Dahmun Goudarzi and Matthieu Rivain CHES 2016, Santa-Barbara Higher-Order Masking x = x 1 + x 2 + + x d 2/28 Higher-Order Masking x = x 1 + x 2 +


slide-1
SLIDE 1

On the Multiplicative Complexity of Boolean Functions and Bitsliced Higher-Order Masking

Dahmun Goudarzi and Matthieu Rivain

CHES 2016, Santa-Barbara

slide-2
SLIDE 2

Higher-Order Masking

x = x1 + x2 + · · · + xd

2/28

slide-3
SLIDE 3

Higher-Order Masking

x = x1 + x2 + · · · + xd

Linear operations: O(d)

2/28

slide-4
SLIDE 4

Higher-Order Masking

x = x1 + x2 + · · · + xd

Linear operations: O(d) Non-linear operations: O(d2)

2/28

slide-5
SLIDE 5

Higher-Order Masking

x = x1 + x2 + · · · + xd

Linear operations: O(d) Non-linear operations: O(d2)

→ Challenge for blockciphers: S-boxes

2/28

slide-6
SLIDE 6

Ishai-Sahai-Wagner Multiplication

i ai

  • ·

i bi

  • =

i,j ai · bj + fresh random

Variant: CPRR evaluation for quadratic functions (Coron etal, FSE 2013)

3/28

slide-7
SLIDE 7

The Polynomial Method

Sbox seen as a (univariate) polynomial over GF(2n) Specific S-boxes, e.g. AES

S(x) = Aff(x254)

Generic methods: ◮ CRV decomposition (CHES 2014):

S(x) = t−1

i=0 gi(x) · hi(x) + ht(x)

◮ Algebraic decomposition (CRYPTO 2015):

S(x) = t−1

i=0 hi(gi(x)) + ht(x)

4/28

slide-8
SLIDE 8

The Bitslice Method

Sbox seen as boolean circuit

5/28

slide-9
SLIDE 9

The Bitslice Method

Sbox seen as boolean circuit

. . . . . . . . .

x1 x2 xn + + +

  • . . .

. . . . . .

X1 X2 Xn

CPU XOR CPU AND CPU XOR

5/28

slide-10
SLIDE 10

Bitslice for S-boxes

Find a compact Boolean circuit at the S-box 16 S-box computed with one bitsliced computation Higher-Order Masking: ◮ XOR → d XORs ◮ AND → ISW-AND Minimizing the O(d2) → minimizing the number of ISW-AND

6/28

slide-11
SLIDE 11

Polynomial vs Bitslice approach

How Fast Can Higher-Order Masking Be in Software?, eprint 2016

2 4 6 8 10 2 4 6 8 ·105 d clock cycles Bitslice AES Best Polynomial 2 4 6 8 10 0.2 0.4 0.6 0.8 1 1.2 ·106 d clock cycles Bitslice PRESENT Best Polynomial

Motivation: bitslice for generic s-box evaluations

7/28

slide-12
SLIDE 12

Multiplicative Complexity of Boolean Functions

8/28

slide-13
SLIDE 13

Boolean functions

Span: f1, f2 . . . , fm =

m

i=0ai fi | ai ∈ F2

  • Mn =
  • x → xu = xu1

1 · xu2 2 · · · xun n | u ∈ {0, 1}n

is the set of monomials

Algebraic Normal Form (ANF):

f(x) =

u∈{0,1}n au xu, i.e. f ∈ Mn

S-box: S(x) = (f1(x), f2(x), . . . , fn(x))

9/28

slide-14
SLIDE 14

Multiplicative Complexity

C(f): minimum number of multiplications to compute f

10/28

slide-15
SLIDE 15

Multiplicative Complexity

C(f): minimum number of multiplications to compute f C(f1, f2, . . . , fn) ≤ C(Mn) = 2n − (n + 1)

10/28

slide-16
SLIDE 16

Multiplicative Complexity

C(f): minimum number of multiplications to compute f C(f1, f2, . . . , fn) ≤ C(Mn) = 2n − (n + 1) ∃f ∈ Mn, C(f) > 2 n 2 − n

10/28

slide-17
SLIDE 17

Multiplicative Complexity

C(f): minimum number of multiplications to compute f C(f1, f2, . . . , fn) ≤ C(Mn) = 2n − (n + 1) ∃f ∈ Mn, C(f) > 2 n 2 − n Method to find optimal solution for n ≤ 5: SAT-Solver

10/28

slide-18
SLIDE 18

Multiplicative Complexity

C(f): minimum number of multiplications to compute f C(f1, f2, . . . , fn) ≤ C(Mn) = 2n − (n + 1) ∃f ∈ Mn, C(f) > 2 n 2 − n Method to find optimal solution for n ≤ 5: SAT-Solver Constructive method [BPP00]:

C(f) ≈ 2

n 2 +1 − n

2 − 2

10/28

slide-19
SLIDE 19

Our results

Generalization of BPP for S-boxes:

C(S) ≈ √n2

n 2 +1 − 3

2n − 1 2 log n

New method: generalization of CRV

C(S) ≈ √n2

n 2 +1 − 2n − 1

n 4 5 6 7 8 9 10 BPP extended 8 16 29 47 87 120 190 Our generic method (Cn,n) 8 17 31 50 77 122 190 Our improved method (C∗

n,n)

7 13 23 38 61 96 145

Table: Multiplicative complexities of n bits s-boxes.

11/28

slide-20
SLIDE 20

New Generic Decomposition Method

12/28

slide-21
SLIDE 21

Decomposition of a Single Boolean Function

f(x) = t

i=0 gi(x) · hi(x)

13/28

slide-22
SLIDE 22

Decomposition of a Single Boolean Function

f(x) = t

i=0 gi(x) · hi(x)

gi: random linear combinations from B = {φj}j

ai,j ←$ {0, 1} gi ←

j ai,jφj

13/28

slide-23
SLIDE 23

Decomposition of a Single Boolean Function

f(x) = t

i=0 gi(x) · hi(x)

gi: random linear combinations from B = {φj}j

ai,j ←$ {0, 1} gi ←

j ai,jφj

find ci,j s.t hi =

j ci,jφj solving a linear system:

f(x) =

i( j ai,jφj(x))( j ci,jφj(x)), ∀x

13/28

slide-24
SLIDE 24

Decomposition of a Single Boolean Function

f(x) =

i( j ai,jφj(x))( j ci,jφj(x)), ∀x

{ei}2n

i=1 = Fn 2

A1c1 + A2c2 + · · · + Atct = (f(e1), f(e2), . . . , f(e2n))

Ai =          φ1(e1) · gi(e1) φ2(e1) · gi(e1) ... φ|B|(e1) · gi(e1) φ1(e2) · gi(e2) φ2(e2) · gi(e2) ... φ|B|(e2) · gi(e2) . . . . . . ... . . . φ1(e2n) · gi(e2n) φ2(e2n) · gi(e2n) ... φ|B|(e2n) · gi(e2n)         

14/28

slide-25
SLIDE 25

Conditions

(t + 1)|B| unknowns, 2n equations:

(t + 1)|B| ≥ 2n

Condition on the sum: t ≥ ⌈ 2n

|B|⌉ − 1

Condition on the basis: B × B has to span all Boolean functions

15/28

slide-26
SLIDE 26

How to Construct the Basis B

Start from B0 such that B0 × B0 = Mn from B0 to B: ◮ φ, ψ ←$ B ◮ B ← φ · ψ

16/28

slide-27
SLIDE 27

Costs

r multiplications for B

r = |B| − n − 1, |B| ≥ |B0|

t multiplications for decomposition products

t ≥ ⌈ 2n

|B|⌉ − 1

Cost: r + t

n 4 5 6 7 8 9 10 (r, t) (2,3) (5,3) (9,5) (16,6) (25,9) (41,11) (59,17) Cn,n 5 8 14 22 34 52 78

17/28

slide-28
SLIDE 28

Decomposition of the S-box

Sbox: x → (f1(x), f2(x), . . . , fn(x)) Apply n Boolean decompositions on the fi’s Costs: r + t · n multiplications

n 4 5 6 7 8 9 10 (r, t) (4,1) (7,2) (13,3) (22,4) (37,5) (59,7) (90,10) Cn,n 8 17 31 50 77 122 190

Works for any S-boxes

18/28

slide-29
SLIDE 29

S-box Dependent Improvements

19/28

slide-30
SLIDE 30

Basis Update Improvements

Start with B1 ⊇ B0 Decompose f1 =

i g1,i · h1,i with B1

20/28

slide-31
SLIDE 31

Basis Update Improvements

Start with B1 ⊇ B0 Decompose f1 =

i g1,i · h1,i with B1

Set B2 = B1 ∪ {g1,i · h1,i} Decompose f2 =

i g2,i · h2,i with B2

20/28

slide-32
SLIDE 32

Basis Update Improvements

Start with B1 ⊇ B0 Decompose f1 =

i g1,i · h1,i with B1

Set B2 = B1 ∪ {g1,i · h1,i} Decompose f2 =

i g2,i · h2,i with B2

Set B3 = B2 ∪ {g2,i · h2,i} Decompose f3 =

i g3,i · h3,i with B3

20/28

slide-33
SLIDE 33

Basis Update Improvements

Start with B1 ⊇ B0 Decompose f1 =

i g1,i · h1,i with B1

Set B2 = B1 ∪ {g1,i · h1,i} Decompose f2 =

i g2,i · h2,i with B2

Set B3 = B2 ∪ {g2,i · h2,i} Decompose f3 =

i g3,i · h3,i with B3

. . .

Bn = Bn−1 ∪ {gn−1,i · hn−1,i} Decompose fn =

i gn,i · hn,i with Bn−1

20/28

slide-34
SLIDE 34

Basis Update Improvements

Start with B1 ⊇ B0 Decompose f1 =

i g1,i · h1,i with B1

t1 = ⌈ 2n

|B1|⌉ − 1

Set B2 = B1 ∪ {g1,i · h1,i} Decompose f2 =

i g2,i · h2,i with B2

t2 = ⌈ 2n

|B2|⌉ − 1

Set B3 = B2 ∪ {g2,i · h2,i} Decompose f3 =

i g3,i · h3,i with B3

t3 = ⌈ 2n

|B3|⌉ − 1

. . .

Bn = Bn−1 ∪ {gn−1,i · hn−1,i} Decompose fn =

i gn,i · hn,i with Bn−1

tn = ⌈ 2n

|Bn|⌉ − 1

Costs: r + t1 + t2 + . . . + tn

20/28

slide-35
SLIDE 35

Rank Drop

A1c1 + A2c2 + · · · + Atct = (f(e0), f(e1), . . . , f(e2n)) System A · c = b with rank(A) = 2n − δ works for

1 2δ boolean functions

Try O(2δ) systems Reduced parameter: (t + 1)|B| ≥ 2n − δ

→ t ≥ ⌈ 2n−δ

|B| ⌉ − 1

21/28

slide-36
SLIDE 36

Results

Sbox Serpent SC2000 S5 SC2000 S6 CLEFIA n 4 5 6 8 Our generic method 7 17 31 77 Our improved method 6 11 21 62 Gain 1 6 10 15

22/28

slide-37
SLIDE 37

Implementation

23/28

slide-38
SLIDE 38

Parallelization

16 S-box → 16-bit bitsliced registers But 32-bit architecture 2 16-bit ISW-AND ⇒ 1 32-bits ISW-AND At the circuit level: grouping AND gates per pair

24/28

slide-39
SLIDE 39

A circuit for AES with parallelizable AND gates

t2 = y12 ∧ y15 t23 = t19 ⊕ y21 t34 = t23 ⊕ t33 z2 = t33 ∧ x7 t3 = y3 ∧ y6 t15 = y8 ∧ y10 t35 = t27 ⊕ t33 z3 = t43 ∧ y16 t5 = y4 ∧ x7 t26 = t21 ∧ t23 t42 = t29 ⊕ t33 z4 = t40 ∧ y1 t7 = y13 ∧ y16 t16 = t15 ⊕ t12 z14 = t29 ∧ y2 z6 = t42 ∧ y11 t8 = y5 ∧ y1 t18 = t6 ⊕ t16 t36 = t24 ∧ t35 z7 = t45 ∧ y17 t10 = y2 ∧ y7 t20 = t11 ⊕ t16 t37 = t36 ⊕ t34 z8 = t41 ∧ y10 t12 = y9 ∧ y11 t24 = t20 ⊕ y18 t38 = t27 ⊕ t36 z9 = t44 ∧ y12 t13 = y14 ∧ y17 t30 = t23 ⊕ t24 t39 = t29 ∧ t38 z10 = t37 ∧ y3 t4 = t3 ⊕ t2 t22 = t18 ⊕ y19 z5 = t29 ∧ y7 z11 = t33 ∧ y4 t6 = t5 ⊕ t2 t25 = t21 ⊕ t22 t44 = t33 ⊕ t37 z12 = t43 ∧ y13 t9 = t8 ⊕ t7 t27 = t24 ⊕ t26 t40 = t25 ⊕ t39 z13 = t40 ∧ y5 t11 = t10 ⊕ t7 t31 = t22 ⊕ t26 t41 = t40 ⊕ t37 z15 = t42 ∧ y9 t14 = t13 ⊕ t12 t28 = t25 ∧ t27 t43 = t29 ⊕ t40 z16 = t45 ∧ y14 t17 = t4 ⊕ t14 t32 = t31 ∧ t30 t45 = t42 ⊕ tt41 z17 = t41 ∧ y8 t19 = t9 ⊕ t14 t29 = t28 ⊕ t22 z0 = t44 ∧ y15 t21 = t17 ⊕ y20 t33 = t33 ⊕ t24 z1 = t37 ∧ y6

25/28

slide-40
SLIDE 40

Parallelization

Parallelization level: k = architecture size

nb of Sboxes

Generic method: MC = ⌈ r

k⌉ + ⌈ n·t k ⌉

Improved method: results for specific s-boxes

26/28

slide-41
SLIDE 41

Performance Comparison in ARM

5 10 15 20 0.2 0.4 0.6 0.8 1 ·106 d clock cycles Our implementation CRV AD

Figure: 16 Sboxes (n = 8), k = 2 → 31 × 2 multiplications .

5 10 15 20 0.5 1 1.5 2 ·105 d clock cycles Our implementation CRV AD

Figure: 16 Sboxes (n = 4), k = 2 → 3 × 2 multiplications.

27/28

slide-42
SLIDE 42

Questions?

28/28