on FPGA using Low-Complexity NTT/INTT Neng Zhang , Bohan Yang, Chen - - PowerPoint PPT Presentation

on fpga using low complexity ntt intt
SMART_READER_LITE
LIVE PREVIEW

on FPGA using Low-Complexity NTT/INTT Neng Zhang , Bohan Yang, Chen - - PowerPoint PPT Presentation

CHES 2020 Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT Neng Zhang , Bohan Yang, Chen Chen, Shouyi Yin, Shaojun Wei and Leibo Liu Institute of Microelectronics, Tsinghua University, Beijing, China Institute


slide-1
SLIDE 1

Institute of Microelectronics, Tsinghua University.

Highly Efficient Architecture of NewHope-NIST

  • n FPGA using Low-Complexity NTT/INTT

Neng Zhang, Bohan Yang, Chen Chen, Shouyi Yin, Shaojun Wei and Leibo Liu Institute of Microelectronics, Tsinghua University, Beijing, China

CHES 2020

slide-2
SLIDE 2

Institute of Microelectronics, Tsinghua University.

2

Outline

 1. Introduction  2. Low-Complexity NTT/INTT  3. Hardware Architecture  4. Implementation Results

slide-3
SLIDE 3

Institute of Microelectronics, Tsinghua University.

3

1 Introduction

NewHope-USENIX NewHope-Simple NewHope-NIST

 NewHope: a PQC algorithm for key encapsulation mechanism (KEM)  A candidate in the 2nd round of NIST PQC standardization process, but not in the 3rd round

Crystals- Dilithium qTesla Falcon LTV BFV

PQC FHE  Low-complexity NTT/INTT can be utilized by other algorithms.

slide-4
SLIDE 4

Institute of Microelectronics, Tsinghua University.

4

1 Introduction

 Main mathematical objects of NewHope  Encryption-based KEM

polynomials over the ring ℝ𝒓 = 𝕬𝒓 𝒚 / 𝒚𝑶 + 𝟐 q 12289 𝝏𝑶 Primitive N-th root of unit over 𝑎𝑟 N 1024 or 512 𝜹𝟑𝑶 Square root of 𝜕𝑂 Key Generation 2 NTTs Encryption 2 NTTs, 1 INTT Decryption 1 INTT

slide-5
SLIDE 5

Institute of Microelectronics, Tsinghua University.

5 ➢ f(x) is arbitrary ➢Convolution theory ➢q≡1 (𝑛𝑝𝑒 𝑂) ➢ f(x) = xN+1 ➢ Negative Wrapped Convolution (NWC) ➢q≡1 (𝑛𝑝𝑒 2𝑂)

Multiplication over the ring Zq[x]/f(x)

1 Introduction

slide-6
SLIDE 6

Institute of Microelectronics, Tsinghua University.

6

area speed

Low-complexity

Low area High speed

 Why do we need low-complexity ?

1 Introduction

slide-7
SLIDE 7

Institute of Microelectronics, Tsinghua University.

7

2.1 Low-Complexity NTT

 Cost of the pre-processing is considerable

[1] S. Roy, et al., Compact ring-lwe cryptoprocessor. CHES 2014

 Low-Complexity NTT ➢A low-complexity NTT with twiddle factors computed on-the-fly [1]. ➢Merge the pre-processing into the DIT FFT with twiddle factors pre-computed.

(N/2) log N + N

FFT pre-processing

Number of modular multiplications of NTT

slide-8
SLIDE 8

Institute of Microelectronics, Tsinghua University.

8

 Derivation of the low-complexity NTT ➢Inspired by the strategy of the Cooley-Turkey FFT ➢Follow the divide-and-conquer method of FFT that divides in time domain (DIT) ➢First, the pre-processing and the FFT are written together as a summation of N items ➢Second, the summation is split into two groups according to parity of the index of a

2.1 Low-Complexity NTT

slide-9
SLIDE 9

Institute of Microelectronics, Tsinghua University.

9

 Derivation of the low-complexity NTT ➢Third, the equation is grouped into two parts according to the size of index i. ➢In this way, N-point NTT can be resolved with two N/2-point NTTs

2.1 Low-Complexity NTT

ො 𝑏𝑗

(0) and ො

𝑏𝑗

(1)are N/2-point NTTs

  • f 𝑏2𝑘 and 𝑏2𝑘+1

N-point NTT N/2-point NTT N/2-point NTT N/4-point NTT N/4-point NTT N/4-point NTT N/4-point NTT 2-point NTT 2-point NTT

… …

slide-10
SLIDE 10

Institute of Microelectronics, Tsinghua University.

10

2.1 Low-Complexity NTT

Dataflow of a 8-point low-complexity NTT Butterfly of low-complexity NTT

slide-11
SLIDE 11

Institute of Microelectronics, Tsinghua University.

11

2.1 Low-Complexity NTT

𝜕 = 𝜕𝑂

𝑘𝑂/𝑛

No additional timing cost; No additional hardware resources cost

In classic FFT:

Computational complexity: (N/2) log N + N → (N/2) log N

slide-12
SLIDE 12

Institute of Microelectronics, Tsinghua University.

12

2.2 Low-Complexity INTT

 Cost of the post-processing is greater than pre-processing

[1] T. Pöppelmann, et al., High-performance ideal lattice-based cryptography on 8-bit atxmega microcontrollers. LATINCRYPT 2015

 Low-Complexity INTT ➢[1] merges the scaling of 𝜇2𝑂

−𝑗 into the FFT.

➢Further merge the scaling of N−1 into the FFT

(N/2) log N + 2N

FFT post-processing

Number of modular multiplications of NTT and INTT

slide-13
SLIDE 13

Institute of Microelectronics, Tsinghua University.

13

 Derivation of the low-complexity INTT ➢Inspired by the strategy of the Gentleman-Sande FFT ➢Follow the divide-and-conquer method of FFT that divides in frequency domain (DIF) ➢First, the post-processing and the FFT are written together as a summation of N items ➢Second, the summation is split into two groups according to the size of index of ො 𝑏

2.2 Low-Complexity INTT

slide-14
SLIDE 14

Institute of Microelectronics, Tsinghua University.

14

 Derivation of the low-complexity INTT ➢Third, the equation is grouped into two parts according to the parity of i. ➢In this way, N-point INTT can be resolved with two N/2-point INTTs

2.2 Low-Complexity INTT

𝑏2𝑗 and 𝑏2𝑗+1 correspond to N/2- point INTT of ෠ 𝑐𝑗

(0) and ෠

𝑐𝑗

(1) N-point INTT

N/2-point INTT N/2-point INTT N/4-point INTT N/4-point INTT N/4-point INTT N/4-point INTT 2-point INTT 2-point INTT

… …

slide-15
SLIDE 15

Institute of Microelectronics, Tsinghua University.

15

2.2 Low-Complexity INTT

Dataflow of a 8-point low-complexity INTT Butterfly of low-complexity INTT

slide-16
SLIDE 16

Institute of Microelectronics, Tsinghua University.

16

2.2 Low-Complexity INTT

𝜕 = 𝜕𝑂

−𝑘𝑂/𝑛

No additional timing cost; slightly modify the butterfly unit 𝑣 + 𝑢 𝑣 − 𝑢

In classic FFT:

Computational complexity: (N/2) log N + 2N → (N/2) log N

slide-17
SLIDE 17

Institute of Microelectronics, Tsinghua University.

17

3 The Hardware Architecture

 The architecture of NTT/INTT  Multi-bank memory ➢Address generator [1] : ➢Log N: Even √ Odd ╳ ➢The execution order of the last s-loop is rearranged as :

[1] W. Wang, et al., VLSI design of a large number multiplier for fully homomorphic encryption. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(9):1879–1887, Sept 2014.

slide-18
SLIDE 18

Institute of Microelectronics, Tsinghua University.

18

 Compact Butterfly Unit

3 The Hardware Architecture

slide-19
SLIDE 19

Institute of Microelectronics, Tsinghua University.

19

3 The Hardware Architecture

No additional multiplication; Time-constant

 Low-Complexity Modular Multiplication

slide-20
SLIDE 20

Institute of Microelectronics, Tsinghua University.

20

3 The Hardware Architecture

➢Support: key generation, encryption and decryption ➢ Doubled bandwidth matching ➢ RAM (R0, R1): two data in an address  The architecture of NewHope-NIST

slide-21
SLIDE 21

Institute of Microelectronics, Tsinghua University.

21

3 The Hardware Architecture

 Timing hiding ➢Resource conflict ➢data dependency

A RAM may be read and write by operations in the same line.

slide-22
SLIDE 22

Institute of Microelectronics, Tsinghua University.

22

4 Implementation Results

 Implementation platform ➢Xilinx Artix-7 FPGA ➢Vivado 2019.1.1 Implementation Results of NTT/INTT

20 40 60 80 100 120 Time (us) 50 100 150 200 250 ATP (LUT x ms) 10 20 30 40 50 60 70 ATP (FF x ms) 500 1000 1500 2000 2500 3000 ATP (DSP x us) 50 100 150 200 250 300 350 ATP (BRAM x us)

Ours [FS19] [KLC+7] [JGCS19] [FSM+19] [BUC19b]

slide-23
SLIDE 23

Institute of Microelectronics, Tsinghua University.

23

4 Implementation Results

 Implementation Results of NewHope-NIST

Ours [JGCS19-1] [JGCS19-2] [buc19b] [FSM+19]

500 1000 1500 2000 2500 3000 3500 KeyGen+Decrypt Encrypt 10000 20000 30000 40000 50000 60000 70000 80000 90000 ATP (LUT x ms) ATP (FF x ms) ATP (DSP x us) ATP (BRAM x us) 2000 4000 6000 8000 10000 12000 14000 16000 LUTs FFs 5 10 15 20 25 30 DSPs BRAMs

Time (us)

slide-24
SLIDE 24

Institute of Microelectronics, Tsinghua University.

24

Conclusion

 Low-complexity NTT/INTT ➢NTT: no pre-processing ➢INTT: no post-processing  A highly efficient architecture of NewHope-NIST ➢A clear advantage in both speed and ATP  Low-complexity NTT/INTT can benefit other NTT-inside algorithms

slide-25
SLIDE 25

Institute of Microelectronics, Tsinghua University.

25

Thanks!