Fo Four on Embedded Device ces with Strong Co Counter ermea - - PowerPoint PPT Presentation
Fo Four on Embedded Device ces with Strong Co Counter ermea - - PowerPoint PPT Presentation
Fo Four on Embedded Device ces with Strong Co Counter ermea measures es Ag Against S t Side-Channel Attack cks CHES 2017 September 26-28, Taipei, Taiwan Zhe Liu Patrick Longa Geovandro C. C. F. Pereira Oscar Reparaz
Zhe Liu Patrick Longa Geovandro C. C. F. Pereira Oscar Reparaz Hwajeong Seo
Context on modern elliptic curves
- 1996, P. Kocher initiates Simple Power Analysis (SPA) attacks (timing).
- 1999, SPA evolves to Differential Power Analysis (DPA) and Template attacks.
1/18
Context on modern elliptic curves
- 1996, P. Kocher initiates Simple Power Analysis (SPA) attacks (timing).
- 1999, SPA evolves to Differential Power Analysis (DPA) and Template attacks.
- 1999, FIPS 186-2 is published
- NIST publishes the 15 popular NIST (Weierstrass) curves along with ECDSA.
1/18
Context on modern elliptic curves
- 1996, P. Kocher initiates Simple Power Analysis (SPA) attacks (timing).
- 1999, SPA evolves to Differential Power Analysis (DPA) and Template attacks.
- 1999, FIPS 186-2 is published
- NIST publishes the 15 popular NIST (Weierstrass) curves along with ECDSA.
- New requirements imposed to ECC
- Constant-time algorithms
- Complete formulas (achieved by models such as (Twisted) Edwards curves).
- Provenance
1/18
Context on modern elliptic curves
- 1996, P. Kocher initiates Simple Power Analysis (SPA) attacks (timing).
- 1999, SPA evolves to Differential Power Analysis (DPA) and Template attacks.
- 1999, FIPS 186-2 is published
- NIST publishes the 15 popular NIST (Weierstrass) curves along with ECDSA.
- New requirements imposed to ECC
- Constant-time algorithms
- Complete formulas (achieved by models such as (Twisted) Edwards curves).
- Provenance
- 2015, NIST holds a workshop for new ECC standardization.
1/18
Next-generation elliptic curves
Farrel-Moriarity-Melkinov-Paterson [NIST ECC Workshop 2015]: “… the real motivation for work in CFRG is the better performance and side- channel resistance of new curves developed by academic cryptographers over the last decade.”
2/18
Platform Fourℚ Curve25519 Speedup ratio Intel Haswell processor, desktop class 56 162 2.9x ARM Cortex-A15, smartphone class 132 315 2.4x ARM Cortex-M4, microcontroller class 470 907 / 1,424 1.9 / 3.0x
Speed (in thousands of cycles) to compute variable-base scalar multiplication on different computer classes.
State-of-the-art ECC: Fourℚ
[Costello-Longa, ASIACRYPT 2015]
3/18
𝐹/𝔾%&: −𝑦* + 𝑧* = 1 + 𝑒𝑦*𝑧* 𝑞 = 22*3 −1, 𝑗* = −1, #𝐹 = 392 8 𝑂, where 𝑂 is a 246-bit prime. 𝑒 = 125317048443780598345676279555970305165𝑗 + 4205857648805777768770,
State-of-the-art ECC: Fourℚ
[Costello-Longa, ASIACRYPT 2015]
4/18
𝐹/𝔾%&: −𝑦* + 𝑧* = 1 + 𝑒𝑦*𝑧* 𝑞 = 22*3 −1, 𝑗* = −1, #𝐹 = 392 8 𝑂, where 𝑂 is a 246-bit prime.
- Fastest (large char) ECC addition laws are complete on 𝐹
- 𝐹 is equipped with two endomorphisms:
- 𝐹 is a degree-2 ℚ-curve: endomorphism 𝜔
- 𝐹 has CM by order of 𝐸 = −40: endomorphism 𝜚
𝑒 = 125317048443780598345676279555970305165𝑗 + 4205857648805777768770,
State-of-the-art ECC: Fourℚ
(Costello-Longa, ASIACRYPT 2015)
4/18
𝐹/𝔾%&: −𝑦* + 𝑧* = 1 + 𝑒𝑦*𝑧* 𝑞 = 22*3 −1, 𝑗* = −1, #𝐹 = 392 8 𝑂, where 𝑂 is a 246-bit prime.
- Fastest (large char) ECC addition laws are complete on 𝐹
- 𝐹 is equipped with two endomorphisms:
- 𝐹 is a degree-2 ℚ-curve: endomorphism 𝜔
- 𝐹 has CM by order of 𝐸 = −40: endomorphism 𝜚
- 𝜔 𝑄 = 𝜇 G 𝑄 and 𝜚 𝑄 = 𝜇 H 𝑄 for all 𝑄 ∈ 𝐹[𝑂] and 𝑛 ∈ [0, 2*MN)
𝑛 ↦ 𝑏2, 𝑏*, 𝑏R, 𝑏S 𝑛 𝑄 = 𝑏2 𝑄 + 𝑏* 𝜚 𝑄 + 𝑏R 𝜔 𝑄 + 𝑏S 𝜔(𝜚 𝑄 ) 𝑒 = 125317048443780598345676279555970305165𝑗 + 4205857648805777768770,
State-of-the-art ECC: Fourℚ
(Costello-Longa, ASIACRYPT 2015)
4/18
𝑛 ↦ 𝑏2, 𝑏*, 𝑏R, 𝑏S Proposition: for all 𝑛 ∈ [0, 2*MNU, decomposition yields four 𝑏V ∈ [0, 2NS⟩ with 𝑏2 odd.
Optimal 4-Way Scalar Decompositions
𝑛 = 42453556751700041597675664513313229052985088397396902723728803518727612539248 𝑏2 = 13045455764875651153 𝑏* = 9751504369311420685 𝑏R = 5603607414148260372 𝑏S = 8360175734463666813 𝑄 𝜚 𝑄 𝜔 𝑄 𝜔 𝜚 𝑄
5/18
𝑛 ↦ 𝑏2, 𝑏*, 𝑏R, 𝑏S Proposition: for all 𝑛 ∈ [0, 2*MNU, decomposition yields four 𝑏V ∈ [0, 2NS⟩ with 𝑏2 odd.
Optimal 4-Way Scalar Decompositions
𝑛 = 42453556751700041597675664513313229052985088397396902723728803518727612539248 𝑏2 = 13045455764875651153 𝑏* = 9751504369311420685 𝑏R = 5603607414148260372 𝑏S = 8360175734463666813 𝑄 𝜚 𝑄 𝜔 𝑄 𝜔 𝜚 𝑄
5/18
Step 1: recode 𝑏2 to signed non-zero representation Step 2: recode 𝑏*, 𝑏R and 𝑏S by “sign-aligning” columns
Multi-Scalar Recoding
𝑏2 = 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1 𝑏* = 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1 𝑏R = 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0 𝑏S = 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1
6/18
𝑏2 = 1, 1 X, 1, 1 X, 1, 1, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1, 1, 1 X, 1 X, 1 X, 1, 1 X, 1 X, 1, 1, 1, 1 X, 1 X, 1, 1, 1 X, 1 X, 1, 1, 1, 1, 1, 1, 1 X, 1 X, 1, 1, 1, 1, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X 𝑏* = 1, 1 X, 0, 0, 0, 1, 0, 0, 1 X, 1, 0, 1 X, 1 X, 0, 1, 0, 0, 0, 1, 1, 1 X, 0, 1 X, 1, 0, 1 X, 0, 0, 1, 0, 1 X, 1, 1, 0, 1 X, 1, 0, 0, 1, 1, 1, 1 X, 1 X, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1 X, 1 X, 0, 0, 1, 1 X, 0, 0, 1 X, 1 X 𝑏R = 0, 0, 1, 0, 1, 0, 1 X, 1, 0, 0, 1 X, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1 X, 1 X, 1 X, 0, 1 X, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1 X, 0, 1 X, 0, 0, 1, 1 X, 0, 0, 0, 1, 1 X, 1, 1 X, 0, 0 𝑏S = 1, 1 X, 0, 1 X, 1, 1, 1 X, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1 X, 0, 0, 0, 0, 1 X, 0, 0, 1, 1 X, 0, 1, 0, 1 X, 1 X, 0, 1, 0, 0, 0, 1, 1 X, 0, 0, 0, 1, 1, 1, 1 X, 1 X, 1 X, 1 X, 0, 1 X, 1, 0, 1 X, 1 X, 0, 0, 0, 0, 0, 1 X, 1 X
Step 1: recode 𝑏2 to signed non-zero representation Step 2: recode 𝑏*, 𝑏R and 𝑏S by “sign-aligning” columns
Multi-Scalar Recoding
𝑏2 = 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1 𝑏* = 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1 𝑏R = 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0 𝑏S = 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1
6/18
𝑏2 = 1, 1 X, 1, 1 X, 1, 1, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1, 1, 1 X, 1 X, 1 X, 1, 1 X, 1 X, 1, 1, 1, 1 X, 1 X, 1, 1, 1 X, 1 X, 1, 1, 1, 1, 1, 1, 1 X, 1 X, 1, 1, 1, 1, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X, 1 X, 1, 1 X, 1, 1 X, 1 X, 1 X 𝑏* = 1, 1 X, 0, 0, 0, 1, 0, 0, 1 X, 1, 0, 1 X, 1 X, 0, 1, 0, 0, 0, 1, 1, 1 X, 0, 1 X, 1, 0, 1 X, 0, 0, 1, 0, 1 X, 1, 1, 0, 1 X, 1, 0, 0, 1, 1, 1, 1 X, 1 X, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1 X, 1 X, 0, 0, 1, 1 X, 0, 0, 1 X, 1 X 𝑏R = 0, 0, 1, 0, 1, 0, 1 X, 1, 0, 0, 1 X, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1 X, 1 X, 1 X, 0, 1 X, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1 X, 0, 1 X, 0, 0, 1, 1 X, 0, 0, 0, 1, 1 X, 1, 1 X, 0, 0 𝑏S = 1, 1 X, 0, 1 X, 1, 1, 1 X, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1 X, 0, 0, 0, 0, 1 X, 0, 0, 1, 1 X, 0, 1, 0, 1 X, 1 X, 0, 1, 0, 0, 0, 1, 1 X, 0, 0, 0, 1, 1, 1, 1 X, 1 X, 1 X, 1 X, 0, 1 X, 1, 0, 1 X, 1 X, 0, 0, 0, 0, 0, 1 X, 1 X
+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −
6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6
column signs 𝑡V digits 𝑒V
Execution
è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄
⋮
- Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
- Reduced number of precomputations (only 8 points).
Regular Multi-Scalar Multiplication
7/18
T[1]
𝑄
T[2] T[3] T[4]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄
T[5]
𝑄 + 𝜔 𝜚 𝑄
T[6]
𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄
T[7]
𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄
T[8]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄
64 times
+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −
6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6
column signs 𝑡V digits 𝑒V
Execution
è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄
⋮
- Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
- Reduced number of precomputations (only 8 points).
Regular Multi-Scalar Multiplication
7/18
T[1]
𝑄
T[2] T[3] T[4]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄
T[5]
𝑄 + 𝜔 𝜚 𝑄
T[6]
𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄
T[7]
𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄
T[8]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄
64 times
+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −
6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6
column signs 𝑡V digits 𝑒V
Execution
è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄
⋮
- Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
- Reduced number of precomputations (only 8 points).
Regular Multi-Scalar Multiplication
7/18
T[1]
𝑄
T[2] T[3] T[4]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄
T[5]
𝑄 + 𝜔 𝜚 𝑄
T[6]
𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄
T[7]
𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄
T[8]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄
64 times
+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −
6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6
column signs 𝑡V digits 𝑒V
Execution
è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄
⋮
- Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
- Reduced number of precomputations (only 8 points).
Regular Multi-Scalar Multiplication
7/18
T[1]
𝑄
T[2] T[3] T[4]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄
T[5]
𝑄 + 𝜔 𝜚 𝑄
T[6]
𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄
T[7]
𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄
T[8]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄
64 times
+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −
6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6
column signs 𝑡V digits 𝑒V
Regular Multi-Scalar Multiplication
7/18
T[1]
𝑄
T[2] T[3] T[4]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄
T[5]
𝑄 + 𝜔 𝜚 𝑄
T[6]
𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄
T[7]
𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄
T[8]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄
64 times
+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −
6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6
column signs 𝑡V digits 𝑒V
Execution
è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄
⋮
- Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
- Reduced number of precomputations (only 8 points).
Regular Multi-Scalar Multiplication
7/18
T[1]
𝑄
T[2] T[3] T[4]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄
T[5]
𝑄 + 𝜔 𝜚 𝑄
T[6]
𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄
T[7]
𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄
T[8]
𝑄 + 𝜚 𝑄 + 𝜔 𝑄 + 𝜔 𝜚 𝑄 𝑄 + 𝜚 𝑄 𝑄 + 𝜔 𝑄
64 times
+ − + − + + − + − + − − − − + − + − + + − − − + − − + + + − − + + − − + + + + + + − − + + + + + − − − − + − + − − − − + − + − − −
6, 6, 3, 5, 7, 6, 7, 3, 2, 2, 3, 2, 2, 1, 8, 1, 5, 1, 6, 8, 8, 3, 4, 2, 3, 6, 3, 1, 6, 5, 2, 6, 4, 5, 6, 2, 5, 1, 4, 2, 8, 6, 2, 2, 2, 8, 7, 8, 5, 7, 5, 7, 2, 5, 8, 4, 6, 5, 1, 4, 4, 3, 3, 6, 6
column signs 𝑡V digits 𝑒V
Execution
è Load 𝑅 = 𝑈 6 = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[6] = 𝑄 + 𝜚 𝑄 + 𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 + 𝑈 3 = 3𝑄 + 2𝜚 𝑄 + 𝜔 𝑄 + 2𝜔 𝜚 𝑄 è 𝑅 = 2𝑅 − 𝑈[5] = 5𝑄 + 4𝜚 𝑄 + 2𝜔 𝑄 + 3𝜔 𝜚 𝑄
⋮
- Regular execution (exactly 64 DBLS and 64 ADDs) facilitates protection against timing/SSCA attacks.
- Reduced number of precomputations (only 8 points).
SPA/DPA-protected scalar multiplication
SPA countermeasures
- Constant-time, constant-flow implementations
üComplete formulas üLadder-based or regular double-and-add based algorithms
8/18
SPA/DPA-protected scalar multiplication
SPA countermeasures
- Constant-time, constant-flow implementations
üComplete formulas üLadder-based or regular double-and-add based algorithms
Previous protections do not prevent
- Differential Power Analysis (DPA): many traces with same key and varying
plaintext
- Other variants: template attacks: very powerful attacker
8/18
DP DPA c countermeas asures f for s scalar alar m mult ltip iplic licatio ion
- 1999: J.S. Coron suggested randomizing the computation by using:
1. Scalar randomization
𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄
9/18
DP DPA c countermeas asures f for s scalar alar m mult ltip iplic licatio ion
- 1999: J.S. Coron suggested randomizing the computation by using:
1. Scalar randomization
𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄
2. Base point blinding (inspired by Chaum’s blind signatures)
Blind: for random 𝑆 𝑸 b ← 𝑄 + 𝑆 Scalar multiplication: 𝑅 ← 𝑛 ⋅ 𝑸 b Unblind: 𝑄 = 𝑅 − 𝑛 ⋅ 𝑆
Moreover, update 𝑆 for the next scalar multiplication. 𝑆 ← −1 d2𝑆
9/18
DP DPA c countermeas asures f for s scalar alar m mult ltip iplic licatio ion
- 1999: J.S. Coron suggested randomizing the computation by using:
1. Scalar randomization
𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄
2. Base point blinding (inspired by Chaum’s blind signatures)
Blind: for random 𝑆 𝑸 b ← 𝑄 + 𝑆 Scalar multiplication: 𝑅 ← 𝑛 ⋅ 𝑸 b Unblind: 𝑄 = 𝑅 − 𝑛 ⋅ 𝑆
Moreover, update 𝑆 for the next scalar multiplication. 𝑆 ← −1 d2𝑆 3. Projective coordinates randomization 𝑄 = 𝑌: 𝑍: 𝑎 ≡ (𝜇𝑌: 𝜇𝑍: 𝜇𝑎), for random 𝜇 ≠ 0
9/18
Sc Scalar r randomi mization
- 1999: J.S. Coron suggested scalar randomization
𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄
where 𝑠 is small (e.g., 20 bits).
10/18
Sc Scalar r randomi mization
- 1999: J.S. Coron suggested scalar randomization
𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝐹 ⋅ 𝑄
where 𝑠 is small (e.g., 20 bits).
- Problem: prime-order curves over pseudo-Mersenne primes
𝑞 = 2ij ± 2i& ⋯ + 𝑑 ,
present undesired repeated 1/0 patterns in #𝐹.
- Unsafe example: curve P-256:
#𝐹 = 0𝑦𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝟏𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝐶𝐷𝐹6𝐺𝐵𝐵𝐸𝐵7179𝐹84𝐺3𝐶9𝐷𝐵𝐷2𝐺𝐷632551 𝑠 ⋅ #𝐹 = 0𝑦𝐷457𝑮𝑮𝑮𝑮3𝐶𝐵8𝟏𝟏𝟏𝟏𝐷457𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝐷𝐷89𝐷7531𝐺9𝐺857𝐷484𝐹071𝐵𝐺𝐷42𝐵𝐵𝐵6𝐸7𝐸80 𝑛 = 0𝑦𝐸312804𝟖𝟔𝟑9𝐵2𝐸4𝑪𝑬𝟕𝟒𝑬𝟘𝟕𝑭𝑪𝟐𝑬𝟘𝐷4𝐵𝐷781588𝐷𝐺𝐹𝐺𝐷𝐺𝐷153398𝐺5𝐹𝐸03506𝐵𝐵58𝐶 𝑛 + 𝑠 ⋅ #𝐹 = 0𝑦𝐷4580𝐸3063𝐵𝐷𝟖𝟔𝟑𝐵672𝐷𝑪𝑬𝟕𝟒𝑬𝟘𝟕𝑭𝑪𝟐𝑬𝟘91363𝐺68𝐵86𝐺754𝐷09𝐵140𝐵𝐵5𝐶12𝐸𝐺𝐵𝐸8230𝐶
10/18
DP DPA co countermeasures fo for scalar multiplication
- Safe example: curve Fourℚ: non-prime order, #𝐹 = 392 ∗ 𝑂
#𝐹 = 0x3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE9968776D07B232910910B6A16D87C1B8
11/18
DP DPA co countermeasures fo for scalar multiplication
- Safe example: curve Fourℚ: non-prime order, #𝐹 = 392 ∗ 𝑂
#𝐹 = 0x3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE9968776D07B232910910B6A16D87C1B8
But we usually work with the prime-order subgroup where #𝑄 = 𝑂, therefore 𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝑂 ⋅ 𝑄 and notice that
𝑂 = 0x 0x29C 29CBC14E5E0A72F 14E5E0A72F05397829C 05397829CBC14E5D 14E5DFBD004D 004DFE0F E0F79992F 79992FB2540EC 2540EC7768C 7768CE7 E7
does not present the undesired patterns and Coron’s technique could be used.
11/18
DP DPA co countermeasures fo for scalar multiplication
- Safe example: curve Fourℚ: non-prime order, #𝐹 = 392 ∗ 𝑂
#𝐹 = 0x3FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE9968776D07B232910910B6A16D87C1B8
But we usually work with the prime-order subgroup where #𝑄 = 𝑂, therefore 𝑛 ⋅ 𝑄 ≡ 𝑛 + 𝑠 ⋅ #𝑂 ⋅ 𝑄 and notice that
𝑂 = 0x 0x29C 29CBC14E5E0A72F 14E5E0A72F05397829C 05397829CBC14E5D 14E5DFBD004D 004DFE0F E0F79992F 79992FB2540EC 2540EC7768C 7768CE7 E7
does not present the undesired patterns and Coron’s technique could be used.
- Can we do better in Fourℚ? A.: yes.
11/18
Sc Scalar r randomi mization
- Remark: Coron’s method is inefficient for curves with endomorphisms.
- In FourQ, we extended to Ciet et al.’s GLV scalar randomization
- Extend every mini-scalar by 16 bits (64 bits in total)
- No problem with pattern repetitions
- Overhead is only 25% (compared against at least 50% overhead in
curve25519)
12/18
Side-channel protected Fourℚ
13/18
Side-channel protected Fourℚ
13/18
Always update blinding point R
Side-channel protected Fourℚ
13/18
Blinding point R plays a role in T ‘Sign-alignment’ cannot be used here, thus New table has now 16 points
Side-channel protected Fourℚ
13/18
projective coordinate randomization
Side-channel protected Fourℚ
13/18
projective coordinate randomization
Side-channel protected Fourℚ
13/18
Multi-scalar randomization adds 16 bits Slightly larger loop length (64 -> 80)
Side-channel evaluation
- Carried out a practical side-channel evaluation on an ARM Cortex-M4
with no dedicated security features.
- EM traces. Low noise: DPA with a dozen measurements works.
- Performed leakage detection and key-recovery attacks for vertical DPA
attacks
- Tested the effectiveness of each countermeasure first in isolation and
then combined
- No leakage detected with up to 10 million measurements with all
countermeasures activated
14/18
Side-channel evaluation: po
point bl t blindi nding ng c corr rrelati tion
15/18
All countermeasures disabled Only point blinding enabled Target correlation detected No first-order leakage detected
Fourℚ software for embedded systems
- Open-source (MIT license).
- C language + Assembly (optional)
- ARM Cortex M4 (32-bit), MSP430(X) (16-bit), AVR ATxmega (8-bit)
- Highly customizable:
- w/ or w/o endomorphisms, tables sizes, w/ or w/o assembly
- Crypto primitives
- KeyAgreement (w/ and w/o compression)
- [Update] Schnorrℚ signature recently included (extended version)
- Speed-records set for ECDH and signatures.
16/18
Speed-record results (speed prioritized)
17/18
Speed-record results (speed prioritized)
17/18
1.9x Renes’16
Speed-record results (speed prioritized)
17/18
2.8x Düll’15 2.5x 3.9x
Remarks and future work
Ø Fast and secure state-of-the-art implementation of Fourℚ on embedded devices Ø Proof of concept: open-source library + side-channel evaluation Ø Focused on speed
Ø Would be interesting to analyze memory tradeoffs
Ø Would also be interesting to extend to other languages (Javascript, Rust) and different platforms.
18/18
Fo Fourℚ on Embedded Device ces with Strong Co Counter ermea measures es Ag Against S t Side-Channel Attack cks
Geovandro C. C. F. Pereira
geovandro.pereira@uwaterloo.ca
Performance analysis on AVR microcontroller
- 1. ECDH-Curve25519 implementation by Düll et al. [DCC 2015].
- 2. ECDH-NIST-P256 implementation by Wenger et al. [Indocrypt 2013].
(1) and (2) do not exploit fixed-base scalar multiplication.
50 100 150 200 250 300 Static ECDH Ephemeral ECDH
Estimated energy consumption in milliJoules on 8-bit AVR ATmega128L @7.37MHz (MICAz wireless sensor node)
NIST P-256 Curve25519 FourQ (C) FourQ (U) 21/23
Additional information
- Fourℚ paper: http://eprint.iacr.org/2015/565.pdf
- Fourℚlib: https://www.microsoft.com/en-us/research/project/fourqlib/
- RFC draft: https://datatracker.ietf.org/doc/draft-ladd-cfrg-4q/
- Reference implementation in python: https://github.com/bifurcation/fourq
- Schnorrℚ: https://www.microsoft.com/en-us/research/wp-content/uploads/
2016/07/SchnorrQ.pdf
- Fourℚ on ARM+NEON: http://eprint.iacr.org/2016/645.pdf
- Fourℚ on FPGA: http://eprint.iacr.org/2016/569.pdf
- Fourℚ on microcontrollers… preprint coming soon!
- Fourℚlib version 3.0… release coming soon!
- Fourℚ on OpenSSL… release coming soon!
21/23
References
[BBJ+08] D.J. Bernstein, P. Birkner, M. Joye, T. Lange and C. Peters. Twisted Edwards curves. AFRICACRYPT 2008. [BDL+11] D.J. Bernstein, N. Duif, T. Lange, P. Schwabe, and B.-Y. Yang. High-speed high-security signatures. CHES 2011. [eBACS] D.J. Bernstein and T. Lange. eBACS: ECRYPT Benchmarking of Cryptographic Systems. http://bench.cr.yp.to/results-dh.html [Edw07]
- H. Edwards. A normal form for elliptic curves. Bulletin of the AMS, 2007.
[GLS09] S.D. Galbraith, X. Lin, M. Scott. Endomorphisms for faster elliptic curve cryptography on a large class of
- curves. EUROCRYPT 2009.
[GLV01] R.P. Gallant, R.J. Lambert, S.A. Vanstone. Faster point multiplication on elliptic curves with efficient
- endomorphisms. CRYPTO 2001.
[GI13]
- A. Guillevic and S. Ionica. Four-dimensional GLV via the Weil restriction. ASIACRYPT 2013.
[HCW+08] H. Hisil, G. Carter, K.K. Wong and E. Dawson. Twisted Edwards curves revisited. ASIACRYPT 2008. [Smi13]
- B. Smith. The Q-curve construction for endomorphism-accelerated elliptic curves. J. Cryptology , 2015.
23/23