Fo Four on Embedded Device ces with Strong Co Counter ermea - - PowerPoint PPT Presentation

fo four on embedded device ces with strong co counter
SMART_READER_LITE
LIVE PREVIEW

Fo Four on Embedded Device ces with Strong Co Counter ermea - - PowerPoint PPT Presentation

Fo Four on Embedded Device ces with Strong Co Counter ermea measures es Ag Against S t Side-Channel Attack cks CHES 2017 September 26-28, Taipei, Taiwan Zhe Liu Patrick Longa Geovandro C. C. F. Pereira Oscar Reparaz


slide-1
SLIDE 1

Fo Fourℚ on Embedded Device ces with Strong Co Counter ermea measures es Ag Against S t Side-Channel Attack cks

Zhe Liu Patrick Longa Geovandro C. C. F. Pereira Oscar Reparaz Hwajeong Seo

CHES 2017 September 26-28, Taipei, Taiwan

slide-2
SLIDE 2

Zhe Liu Patrick Longa Geovandro C. C. F. Pereira Oscar Reparaz Hwajeong Seo

slide-3
SLIDE 3

Context on modern elliptic curves

  • 1996, P. Kocher initiates Simple Power Analysis (SPA) attacks (timing).
  • 1999, SPA evolves to Differential Power Analysis (DPA) and Template attacks.

1/18

slide-4
SLIDE 4

Context on modern elliptic curves

  • 1996, P. Kocher initiates Simple Power Analysis (SPA) attacks (timing).
  • 1999, SPA evolves to Differential Power Analysis (DPA) and Template attacks.
  • 1999, FIPS 186-2 is published
  • NIST publishes the 15 popular NIST (Weierstrass) curves along with ECDSA.

1/18

slide-5
SLIDE 5

Context on modern elliptic curves

  • 1996, P. Kocher initiates Simple Power Analysis (SPA) attacks (timing).
  • 1999, SPA evolves to Differential Power Analysis (DPA) and Template attacks.
  • 1999, FIPS 186-2 is published
  • NIST publishes the 15 popular NIST (Weierstrass) curves along with ECDSA.
  • New requirements imposed to ECC
  • Constant-time algorithms
  • Complete formulas (achieved by models such as (Twisted) Edwards curves).
  • Provenance

1/18

slide-6
SLIDE 6

Context on modern elliptic curves

  • 1996, P. Kocher initiates Simple Power Analysis (SPA) attacks (timing).
  • 1999, SPA evolves to Differential Power Analysis (DPA) and Template attacks.
  • 1999, FIPS 186-2 is published
  • NIST publishes the 15 popular NIST (Weierstrass) curves along with ECDSA.
  • New requirements imposed to ECC
  • Constant-time algorithms
  • Complete formulas (achieved by models such as (Twisted) Edwards curves).
  • Provenance
  • 2015, NIST holds a workshop for new ECC standardization.

1/18

slide-7
SLIDE 7

Next-generation elliptic curves

Farrel-Moriarity-Melkinov-Paterson [NIST ECC Workshop 2015]: “… the real motivation for work in CFRG is the better performance and side- channel resistance of new curves developed by academic cryptographers over the last decade.”

2/18

slide-8
SLIDE 8

Platform Fourℚ Curve25519 Speedup ratio Intel Haswell processor, desktop class 56 162 2.9x ARM Cortex-A15, smartphone class 132 315 2.4x ARM Cortex-M4, microcontroller class 470 907 / 1,424 1.9 / 3.0x

Speed (in thousands of cycles) to compute variable-base scalar multiplication on different computer classes.

State-of-the-art ECC: Fourℚ

[Costello-Longa, ASIACRYPT 2015]

3/18

slide-9
SLIDE 9

𝐹/𝔾%&: −𝑦* + 𝑧* = 1 + 𝑒𝑦*𝑧* 𝑞 = 22*3 −1, 𝑗* = −1, #𝐹 = 392 8 𝑂, where 𝑂 is a 246-bit prime. 𝑒 = 125317048443780598345676279555970305165𝑗 + 4205857648805777768770,

State-of-the-art ECC: Fourℚ

[Costello-Longa, ASIACRYPT 2015]

4/18

slide-10
SLIDE 10

𝐹/𝔾%&: −𝑦* + 𝑧* = 1 + 𝑒𝑦*𝑧* 𝑞 = 22*3 −1, 𝑗* = −1, #𝐹 = 392 8 𝑂, where 𝑂 is a 246-bit prime.

  • Fastest (large char) ECC addition laws are complete on 𝐹
  • 𝐹 is equipped with two endomorphisms:
  • 𝐹 is a degree-2 ℚ-curve: endomorphism 𝜔
  • 𝐹 has CM by order of 𝐸 = −40: endomorphism 𝜚

𝑒 = 125317048443780598345676279555970305165𝑗 + 4205857648805777768770,

State-of-the-art ECC: Fourℚ

(Costello-Longa, ASIACRYPT 2015)

4/18

slide-11
SLIDE 11

𝐹/𝔾%&: −𝑦* + 𝑧* = 1 + 𝑒𝑦*𝑧* 𝑞 = 22*3 −1, 𝑗* = −1, #𝐹 = 392 8 𝑂, where 𝑂 is a 246-bit prime.

  • Fastest (large char) ECC addition laws are complete on 𝐹
  • 𝐹 is equipped with two endomorphisms:
  • 𝐹 is a degree-2 ℚ-curve: endomorphism 𝜔
  • 𝐹 has CM by order of 𝐸 = −40: endomorphism 𝜚
  • 𝜔 𝑄 = 𝜇 G 𝑄 and 𝜚 𝑄 = 𝜇 H 𝑄 for all 𝑄 ∈ 𝐹[𝑂] and 𝑛 ∈ [0, 2*MN)

𝑛 ↦ 𝑏2, 𝑏*, 𝑏R, 𝑏S 𝑛 𝑄 = 𝑏2 𝑄 + 𝑏* 𝜚 𝑄 + 𝑏R 𝜔 𝑄 + 𝑏S 𝜔(𝜚 𝑄 ) 𝑒 = 125317048443780598345676279555970305165𝑗 + 4205857648805777768770,

State-of-the-art ECC: Fourℚ

(Costello-Longa, ASIACRYPT 2015)

4/18

slide-12
SLIDE 12

𝑛 ↦ 𝑏2, 𝑏*, 𝑏R, 𝑏S Proposition: for all 𝑛 ∈ [0, 2*MNU, decomposition yields four 𝑏V ∈ [0, 2NS⟩ with 𝑏2 odd.

Optimal 4-Way Scalar Decompositions

𝑛 = 42453556751700041597675664513313229052985088397396902723728803518727612539248 𝑏2 = 13045455764875651153 𝑏* = 9751504369311420685 𝑏R = 5603607414148260372 𝑏S = 8360175734463666813 𝑄 𝜚 𝑄 𝜔 𝑄 𝜔 𝜚 𝑄

5/18

slide-13
SLIDE 13

𝑛 ↦ 𝑏2, 𝑏*, 𝑏R, 𝑏S Proposition: for all 𝑛 ∈ [0, 2*MNU, decomposition yields four 𝑏V ∈ [0, 2NS⟩ with 𝑏2 odd.

Optimal 4-Way Scalar Decompositions

𝑛 = 42453556751700041597675664513313229052985088397396902723728803518727612539248 𝑏2 = 13045455764875651153 𝑏* = 9751504369311420685 𝑏R = 5603607414148260372 𝑏S = 8360175734463666813 𝑄 𝜚 𝑄 𝜔 𝑄 𝜔 𝜚 𝑄

5/18

slide-14
SLIDE 14

Step 1: recode 𝑏2 to signed non-zero representation Step 2: recode 𝑏*, 𝑏R and 𝑏S by “sign-aligning” columns

Multi-Scalar Recoding

𝑏2 = 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1 𝑏* = 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1 𝑏R = 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0 𝑏S = 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1

6/18

𝑏2 = 1, 1 X, 1, 1 X, 1, 1, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1, 1, 1 X, 1 X, 1 X, 1, 1 X, 1 X, 1, 1, 1, 1 X, 1 X, 1, 1, 1 X, 1 X, 1, 1, 1, 1, 1, 1, 1 X, 1 X, 1, 1, 1, 1, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X 𝑏* = 1, 1 X, 0, 0, 0, 1, 0, 0, 1 X, 1, 0, 1 X, 1 X, 0, 1, 0, 0, 0, 1, 1, 1 X, 0, 1 X, 1, 0, 1 X, 0, 0, 1, 0, 1 X, 1, 1, 0, 1 X, 1, 0, 0, 1, 1, 1, 1 X, 1 X, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1 X, 1 X, 0, 0, 1, 1 X, 0, 0, 1 X, 1 X 𝑏R = 0, 0, 1, 0, 1, 0, 1 X, 1, 0, 0, 1 X, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1 X, 1 X, 1 X, 0, 1 X, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1 X, 0, 1 X, 0, 0, 1, 1 X, 0, 0, 0, 1, 1 X, 1, 1 X, 0, 0 𝑏S = 1, 1 X, 0, 1 X, 1, 1, 1 X, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1 X, 0, 0, 0, 0, 1 X, 0, 0, 1, 1 X, 0, 1, 0, 1 X, 1 X, 0, 1, 0, 0, 0, 1, 1 X, 0, 0, 0, 1, 1, 1, 1 X, 1 X, 1 X, 1 X, 0, 1 X, 1, 0, 1 X, 1 X, 0, 0, 0, 0, 0, 1 X, 1 X

slide-15
SLIDE 15

Step 1: recode 𝑏2 to signed non-zero representation Step 2: recode 𝑏*, 𝑏R and 𝑏S by “sign-aligning” columns

Multi-Scalar Recoding

𝑏2 = 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1 𝑏* = 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1 𝑏R = 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0 𝑏S = 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1

6/18

𝑏2 = 1, 1 X, 1, 1 X, 1, 1, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1, 1, 1 X, 1 X, 1 X, 1, 1 X, 1 X, 1, 1, 1, 1 X, 1 X, 1, 1, 1 X, 1 X, 1, 1, 1, 1, 1, 1, 1 X, 1 X, 1, 1, 1, 1, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X 𝑏* = 1, 1 X, 0, 0, 0, 1, 0, 0, 1 X, 1, 0, 1 X, 1 X, 0, 1, 0, 0, 0, 1, 1, 1 X, 0, 1 X, 1, 0, 1 X, 0, 0, 1, 0, 1 X, 1, 1, 0, 1 X, 1, 0, 0, 1, 1, 1, 1 X, 1 X, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1 X, 1 X, 0, 0, 1, 1 X, 0, 0, 1 X, 1 X 𝑏R = 0, 0, 1, 0, 1, 0, 1 X, 1, 0, 0, 1 X, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1 X, 1 X, 1 X, 0, 1 X, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1 X, 0, 1 X, 0, 0, 1, 1 X, 0, 0, 0, 1, 1 X, 1, 1 X, 0, 0 𝑏S = 1, 1 X, 0, 1 X, 1, 1, 1 X, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1 X, 0, 0, 0, 0, 1 X, 0, 0, 1, 1 X, 0, 1, 0, 1 X, 1 X, 0, 1, 0, 0, 0, 1, 1 X, 0, 0, 0, 1, 1, 1, 1 X, 1 X, 1 X, 1 X, 0, 1 X, 1, 0, 1 X, 1 X, 0, 0, 0, 0, 0, 1 X, 1 X

+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −

6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6

column signs 𝑡V digits 𝑒V

slide-16
SLIDE 16

Execution

è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄

  • Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
  • Reduced number of precomputations (only 8 points).

Regular Multi-Scalar Multiplication

7/18

T[1]

𝑄

T[2] T[3] T[4]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄

T[5]

𝑄 + 𝜔 𝜚 𝑄

T[6]

𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄

T[7]

𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄

T[8]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄

64 times

+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −

6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6

column signs 𝑡V digits 𝑒V

slide-17
SLIDE 17

Execution

è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄

  • Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
  • Reduced number of precomputations (only 8 points).

Regular Multi-Scalar Multiplication

7/18

T[1]

𝑄

T[2] T[3] T[4]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄

T[5]

𝑄 + 𝜔 𝜚 𝑄

T[6]

𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄

T[7]

𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄

T[8]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄

64 times

+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −

6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6

column signs 𝑡V digits 𝑒V

slide-18
SLIDE 18

Execution

è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄

  • Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
  • Reduced number of precomputations (only 8 points).

Regular Multi-Scalar Multiplication

7/18

T[1]

𝑄

T[2] T[3] T[4]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄

T[5]

𝑄 + 𝜔 𝜚 𝑄

T[6]

𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄

T[7]

𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄

T[8]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄

64 times

+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −

6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6

column signs 𝑡V digits 𝑒V

slide-19
SLIDE 19

Execution

è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄

  • Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
  • Reduced number of precomputations (only 8 points).

Regular Multi-Scalar Multiplication

7/18

T[1]

𝑄

T[2] T[3] T[4]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄

T[5]

𝑄 + 𝜔 𝜚 𝑄

T[6]

𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄

T[7]

𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄

T[8]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄

64 times

+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −

6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6

column signs 𝑡V digits 𝑒V

slide-20
SLIDE 20

Regular Multi-Scalar Multiplication

7/18

T[1]

𝑄

T[2] T[3] T[4]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄

T[5]

𝑄 + 𝜔 𝜚 𝑄

T[6]

𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄

T[7]

𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄

T[8]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄

64 times

+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −

6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6

column signs 𝑡V digits 𝑒V

Execution

è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄

  • Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
  • Reduced number of precomputations (only 8 points).
slide-21
SLIDE 21

Regular Multi-Scalar Multiplication

7/18

T[1]

𝑄

T[2] T[3] T[4]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄

T[5]

𝑄 + 𝜔 𝜚 𝑄

T[6]

𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄

T[7]

𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄

T[8]

𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄

64 times

+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −

6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6

column signs 𝑡V digits 𝑒V

Execution

è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄

  • Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
  • Reduced number of precomputations (only 8 points).
slide-22
SLIDE 22

SPA/DPA-protected scalar multiplication

SPA countermeasures

  • Constant-time, constant-flow implementations

üComplete formulas üLadder-based or regular double-and-add based algorithms

8/18

slide-23
SLIDE 23

SPA/DPA-protected scalar multiplication

SPA countermeasures

  • Constant-time, constant-flow implementations

üComplete formulas üLadder-based or regular double-and-add based algorithms

Previous protections do not prevent

  • Differential Power Analysis (DPA): many traces with same key and varying

plaintext

  • Other variants: template attacks: very powerful attacker

8/18

slide-24
SLIDE 24

DP DPA c countermeas asures f for s scalar alar m mult ltip iplic licatio ion

  • 1999: J.S. Coron suggested randomizing the computation by using:

1. Scalar randomization

𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄

9/18

slide-25
SLIDE 25

DP DPA c countermeas asures f for s scalar alar m mult ltip iplic licatio ion

  • 1999: J.S. Coron suggested randomizing the computation by using:

1. Scalar randomization

𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄

2. Base point blinding (inspired by Chaum’s blind signatures)

Blind: for random 𝑆 𝑸 b ← 𝑄 + 𝑆 Scalar multiplication: 𝑅 ← 𝑛 ⋅ 𝑸 b Unblind: 𝑄 = 𝑅 − 𝑛 ⋅ 𝑆

Moreover, update 𝑆 for the next scalar multiplication. 𝑆 ← −1 d2𝑆

9/18

slide-26
SLIDE 26

DP DPA c countermeas asures f for s scalar alar m mult ltip iplic licatio ion

  • 1999: J.S. Coron suggested randomizing the computation by using:

1. Scalar randomization

𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄

2. Base point blinding (inspired by Chaum’s blind signatures)

Blind: for random 𝑆 𝑸 b ← 𝑄 + 𝑆 Scalar multiplication: 𝑅 ← 𝑛 ⋅ 𝑸 b Unblind: 𝑄 = 𝑅 − 𝑛 ⋅ 𝑆

Moreover, update 𝑆 for the next scalar multiplication. 𝑆 ← −1 d2𝑆 3. Projective coordinates randomization 𝑄 = 𝑌: 𝑍: 𝑎 ≡ (𝜇𝑌: 𝜇𝑍: 𝜇𝑎), for random 𝜇 ≠ 0

9/18

slide-27
SLIDE 27

Sc Scalar r randomi mization

  • 1999: J.S. Coron suggested scalar randomization

𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄

where 𝑠 is small (e.g., 20 bits).

10/18

slide-28
SLIDE 28

Sc Scalar r randomi mization

  • 1999: J.S. Coron suggested scalar randomization

𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄

where 𝑠 is small (e.g., 20 bits).

  • Problem: prime-order curves over pseudo-Mersenne primes

𝑞 = 2ij ± 2i& ⋯ + 𝑑 ,

present undesired repeated 1/0 patterns in #𝐹.

  • Unsafe example: curve P-256:

#𝐹 = 0𝑦𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝐶𝐷𝐹6𝐺𝐵𝐵𝐸𝐵7179𝐹84𝐺3𝐶9𝐷𝐵𝐷2𝐺𝐷632551 𝑠 ⋅ #𝐹 = 0𝑦𝐷457𝑮𝑮𝑮𝑮3𝐶𝐵8𝟏𝟏𝟏𝟏𝐷457𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝐷𝐷89𝐷7531𝐺9𝐺857𝐷484𝐹071𝐵𝐺𝐷42𝐵𝐵𝐵6𝐸7𝐸80 𝑛 = 0𝑦𝐸312804𝟖𝟔𝟑9𝐵2𝐸4𝑪𝑬𝟕𝟒𝑬𝟘𝟕𝑭𝑪𝟐𝑬𝟘𝐷4𝐵𝐷781588𝐷𝐺𝐹𝐺𝐷𝐺𝐷153398𝐺5𝐹𝐸03506𝐵𝐵58𝐶 𝑛 + 𝑠 ⋅ #𝐹 = 0𝑦𝐷4580𝐸3063𝐵𝐷𝟖𝟔𝟑𝐵672𝐷𝑪𝑬𝟕𝟒𝑬𝟘𝟕𝑭𝑪𝟐𝑬𝟘91363𝐺68𝐵86𝐺754𝐷09𝐵140𝐵𝐵5𝐶12𝐸𝐺𝐵𝐸8230𝐶

10/18

slide-29
SLIDE 29

DP DPA co countermeasures fo for scalar multiplication

  • Safe example: curve Fourℚ: non-prime order, #𝐹 = 392 ∗ 𝑂

#𝐹 = 0x3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE9968776D07B232910910B6A16D87C1B8

11/18

slide-30
SLIDE 30

DP DPA co countermeasures fo for scalar multiplication

  • Safe example: curve Fourℚ: non-prime order, #𝐹 = 392 ∗ 𝑂

#𝐹 = 0x3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE9968776D07B232910910B6A16D87C1B8

But we usually work with the prime-order subgroup where #𝑄 = 𝑂, therefore 𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝑂 ⋅ 𝑄 and notice that

𝑂 = 0x 0x29C 29CBC14E5E0A72F 14E5E0A72F05397829C 05397829CBC14E5D 14E5DFBD004D 004DFE0F E0F79992F 79992FB2540EC 2540EC7768C 7768CE7 E7

does not present the undesired patterns and Coron’s technique could be used.

11/18

slide-31
SLIDE 31

DP DPA co countermeasures fo for scalar multiplication

  • Safe example: curve Fourℚ: non-prime order, #𝐹 = 392 ∗ 𝑂

#𝐹 = 0x3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE9968776D07B232910910B6A16D87C1B8

But we usually work with the prime-order subgroup where #𝑄 = 𝑂, therefore 𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝑂 ⋅ 𝑄 and notice that

𝑂 = 0x 0x29C 29CBC14E5E0A72F 14E5E0A72F05397829C 05397829CBC14E5D 14E5DFBD004D 004DFE0F E0F79992F 79992FB2540EC 2540EC7768C 7768CE7 E7

does not present the undesired patterns and Coron’s technique could be used.

  • Can we do better in Fourℚ? A.: yes.

11/18

slide-32
SLIDE 32

Sc Scalar r randomi mization

  • Remark: Coron’s method is inefficient for curves with endomorphisms.
  • In FourQ, we extended to Ciet et al.’s GLV scalar randomization
  • Extend every mini-scalar by 16 bits (64 bits in total)
  • No problem with pattern repetitions
  • Overhead is only 25% (compared against at least 50% overhead in

curve25519)

12/18

slide-33
SLIDE 33

Side-channel protected Fourℚ

13/18

slide-34
SLIDE 34

Side-channel protected Fourℚ

13/18

Always update blinding point R

slide-35
SLIDE 35

Side-channel protected Fourℚ

13/18

Blinding point R plays a role in T ‘Sign-alignment’ cannot be used here, thus New table has now 16 points

slide-36
SLIDE 36

Side-channel protected Fourℚ

13/18

projective coordinate randomization

slide-37
SLIDE 37

Side-channel protected Fourℚ

13/18

projective coordinate randomization

slide-38
SLIDE 38

Side-channel protected Fourℚ

13/18

Multi-scalar randomization adds 16 bits Slightly larger loop length (64 -> 80)

slide-39
SLIDE 39

Side-channel evaluation

  • Carried out a practical side-channel evaluation on an ARM Cortex-M4

with no dedicated security features.

  • EM traces. Low noise: DPA with a dozen measurements works.
  • Performed leakage detection and key-recovery attacks for vertical DPA

attacks

  • Tested the effectiveness of each countermeasure first in isolation and

then combined

  • No leakage detected with up to 10 million measurements with all

countermeasures activated

14/18

slide-40
SLIDE 40

Side-channel evaluation: po

point bl t blindi nding ng c corr rrelati tion

15/18

All countermeasures disabled Only point blinding enabled Target correlation detected No first-order leakage detected

slide-41
SLIDE 41

Fourℚ software for embedded systems

  • Open-source (MIT license).
  • C language + Assembly (optional)
  • ARM Cortex M4 (32-bit), MSP430(X) (16-bit), AVR ATxmega (8-bit)
  • Highly customizable:
  • w/ or w/o endomorphisms, tables sizes, w/ or w/o assembly
  • Crypto primitives
  • KeyAgreement (w/ and w/o compression)
  • [Update] Schnorrℚ signature recently included (extended version)
  • Speed-records set for ECDH and signatures.

16/18

slide-42
SLIDE 42

Speed-record results (speed prioritized)

17/18

slide-43
SLIDE 43

Speed-record results (speed prioritized)

17/18

1.9x Renes’16

slide-44
SLIDE 44

Speed-record results (speed prioritized)

17/18

2.8x Düll’15 2.5x 3.9x

slide-45
SLIDE 45

Remarks and future work

Ø Fast and secure state-of-the-art implementation of Fourℚ on embedded devices Ø Proof of concept: open-source library + side-channel evaluation Ø Focused on speed

Ø Would be interesting to analyze memory tradeoffs

Ø Would also be interesting to extend to other languages (Javascript, Rust) and different platforms.

18/18

slide-46
SLIDE 46

Fo Fourℚ on Embedded Device ces with Strong Co Counter ermea measures es Ag Against S t Side-Channel Attack cks

Geovandro C. C. F. Pereira

geovandro.pereira@uwaterloo.ca

slide-47
SLIDE 47

Performance analysis on AVR microcontroller

  • 1. ECDH-Curve25519 implementation by Düll et al. [DCC 2015].
  • 2. ECDH-NIST-P256 implementation by Wenger et al. [Indocrypt 2013].

(1) and (2) do not exploit fixed-base scalar multiplication.

50 100 150 200 250 300 Static ECDH Ephemeral ECDH

Estimated energy consumption in milliJoules on 8-bit AVR ATmega128L @7.37MHz (MICAz wireless sensor node)

NIST P-256 Curve25519 FourQ (C) FourQ (U) 21/23

slide-48
SLIDE 48

Additional information

  • Fourℚ paper: http://eprint.iacr.org/2015/565.pdf
  • Fourℚlib: https://www.microsoft.com/en-us/research/project/fourqlib/
  • RFC draft: https://datatracker.ietf.org/doc/draft-ladd-cfrg-4q/
  • Reference implementation in python: https://github.com/bifurcation/fourq
  • Schnorrℚ: https://www.microsoft.com/en-us/research/wp-content/uploads/

2016/07/SchnorrQ.pdf

  • Fourℚ on ARM+NEON: http://eprint.iacr.org/2016/645.pdf
  • Fourℚ on FPGA: http://eprint.iacr.org/2016/569.pdf
  • Fourℚ on microcontrollers… preprint coming soon!
  • Fourℚlib version 3.0… release coming soon!
  • Fourℚ on OpenSSL… release coming soon!

21/23

slide-49
SLIDE 49

References

[BBJ+08] D.J. Bernstein, P. Birkner, M. Joye, T. Lange and C. Peters. Twisted Edwards curves. AFRICACRYPT 2008. [BDL+11] D.J. Bernstein, N. Duif, T. Lange, P. Schwabe, and B.-Y. Yang. High-speed high-security signatures. CHES 2011. [eBACS] D.J. Bernstein and T. Lange. eBACS: ECRYPT Benchmarking of Cryptographic Systems. http://bench.cr.yp.to/results-dh.html [Edw07]

  • H. Edwards. A normal form for elliptic curves. Bulletin of the AMS, 2007.

[GLS09] S.D. Galbraith, X. Lin, M. Scott. Endomorphisms for faster elliptic curve cryptography on a large class of

  • curves. EUROCRYPT 2009.

[GLV01] R.P. Gallant, R.J. Lambert, S.A. Vanstone. Faster point multiplication on elliptic curves with efficient

  • endomorphisms. CRYPTO 2001.

[GI13]

  • A. Guillevic and S. Ionica. Four-dimensional GLV via the Weil restriction. ASIACRYPT 2013.

[HCW+08] H. Hisil, G. Carter, K.K. Wong and E. Dawson. Twisted Edwards curves revisited. ASIACRYPT 2008. [Smi13]

  • B. Smith. The Q-curve construction for endomorphism-accelerated elliptic curves. J. Cryptology , 2015.

23/23