Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash - - PowerPoint PPT Presentation

amir ali kouzeh geran and arash reyhani masoleh
SMART_READER_LITE
LIVE PREVIEW

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash - - PowerPoint PPT Presentation

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash Reyhani-Masoleh Department of Electrical and Computer Engineering Western University, London, Ontario, Canada 23rd IEEE Symposium on Computer Arithmetic (ARITH 23) June 11,


slide-1
SLIDE 1

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh

Presented by: Arash Reyhani-Masoleh Department of Electrical and Computer Engineering Western University, London, Ontario, Canada 23rd IEEE Symposium on Computer Arithmetic (ARITH 23) June 11, 2016

slide-2
SLIDE 2

Outline

Motivation Preliminaries Single-bit Fault Detection Scheme CRC-based Fault Detection Scheme Fault Simulation Results FPGA Implementations and Overheads Conclusion

2

slide-3
SLIDE 3

Motivations: GCM

 Galois/Counter Mode (GCM) is a recently adopted mode

  • f operation for symmetric key cryptography (like AES).

 Proposed by McGrew and Viega in 2005 and was

defined by NIST (SP 800-38D) in 2007.

 AES-GCM is included in “NSA Suite B Cryptography”.  It is being used in a number of protocols and standards:

 IEEE 802.1AE, IEEE 802.11 AD  ANSI (INCITS) Fiber Channel Security Protocols (FC-SP).  IEEE P1619.1 tape storage, IETF IPsec standards, SSH

and TLS 1.2.

 It provides authentication assurance for additional data

that is not encrypted.

 It detects accidental modifications of data, unauthorized

alterations, and protects confidentiality.

3

slide-4
SLIDE 4

Motivations: Reliable GCM

 Sources of faults in cryptographic systems:

 Natural Faults  Fault Attacks: inject faults and look for leakage of

information.

 The need for fault detection method

 Protect the integrity and authenticity of data  Prevent the attack sequence in case of fault attack.

 In this paper, we propose a reliable GCM scheme to

detect both permanent and transient faults.

 Low overhead in terms of area and delay.  Acceptable fault coverage.

4

slide-5
SLIDE 5

Preliminaries

 The GCM has two operations: authenticated encryption

and authenticated decryption.

 There are 4 inputs for authenticated encryption:

1.

A secret key (K) with the length based on the block cipher.

2.

An initialization vector (IV) between 1 and 264.

3.

A plaintext (P) with any number of bits between 0 and 239 − 256

4.

An additional authenticated data (A), which is authenticated but not encrypted, with any number of bits between 0 and 264.

 There are two outputs for authenticated encryption:

1.

A ciphertext (C) whose length is exactly that of the plaintext.

2.

An authentication tag (T), whose length can be any value between 0 and 128.

5

slide-6
SLIDE 6

AES-GCM Block Diagram

6

  • The “Hash Key” H is generated by the encryption of 128

bits of zero using the symmetric key (K): H = E(K,0128)=EK(0)

  • The Plaintext P is

divided into n blocks

  • f 128-bit long:

P1, P2, . . . , Pn

  • An up-counter with the output Ui is used to generate

blocks of ciphertext: Ci=Pi ⊕ EK(Ui) for i=1, 2, …, n.

  • The Additional

Authenticated Data A is represented as m blocks of 128 bits: A1, A2, . . . , Am

slide-7
SLIDE 7

AES-GCM Block Diagram (cont.)

7

  • Using the inputs H, A and C, the output of the GCM

is defined by Xm+n+1 = GHASH (H, A, C), where

  • The 128-bit register Y
  • Cleared initially.
  • After the (m+n+1)th

clock cycle, it contains

Xm+n+1 = GHASH (H, A, C).

  • In this paper, we

consider the GCM loop.

slide-8
SLIDE 8

Single-bit Fault Detection Scheme

 The parity of multiplier output (Xi) is computed using

two different functions:

1.

Actual parity (pXi ) is obtained by XORing the coordinates of Xi

8

). , , ( ˆ Y C H f p

i X i 

Then, they are compared to find error:

  • 2. The predicted parity is a

complex function of H, Ci, Y: if 𝑞 ≠ Ƹ 𝑞 ⇒ eout=1.

slide-9
SLIDE 9

Single-bit Parity Prediction Formulations

We write the multiplier output as follows:

 𝑌𝑗 = 𝐼 × 𝐸𝑗 mod 𝐺(α), where α is the root of

irreducible polynomial F(x)=x128 + x7 + x2 + x + 1 and 0 ≤ 𝑗 ≤ 𝑛 + 𝑜 + 1.

 The hash key 𝐼 ∈ GF(2128) is fixed in each iterations 𝑗.  The field element 𝐸𝑗 = σ𝑘=0

127 𝑒𝑘α𝑘 (drop 𝑗 for simplicity).

 𝑌𝑗= σ𝑘=0

127 𝑒𝑘 𝑎

(𝑘), where 𝑎 𝑘 = (𝐼 α𝑘 )mod 𝐺(α), Z(0)=H.

Then, the parity prediction of multiplier output:

9

. ˆ ˆ

127

) (

j Z j X

j i

p d p

slide-10
SLIDE 10

Single-bit Parity Prediction Formulations (Cont.)

Since 𝐸 = 𝑍 + 𝐷 ⇒ dj=yj+cj

10

) ( ) (

ˆ ˆ ˆ

127 127

j j i

Z j j Z j j X

p c p y p

 

 

 

) (

ˆ

j

Z

p

  • , 0 ≤ 𝑘 ≤ 127 , is a binary function and depends on

the coordinates of 𝐼 ∈ 𝐻𝐺 2128 :

  • 𝑎

0 = 𝐼

  • 𝑎

1 = 𝑎 0 α mod 𝐺 α ⇒

  • In general:
  • These values are stored in a register (PH) at the

initialization phase.

  • They remain constant for the entire 𝑛 + 𝑜 + 1 cycles of

the GCM computation.

127

) (

ˆ ˆ

j Z j X

j i

p d p

⇒ . ˆ

) (

H Z

p p 

. ˆ ˆ

127

) ( ) 1 (

h p p

Z Z

  . 127 1 ˆ ˆ

) 1 ( 127

) 1 ( ) (

   

j for z p p

j Z Z

j j

slide-11
SLIDE 11

Single Parity Fault Detection Architecture

11

. ˆ ˆ ˆ

) ( ) (

127 127

j j i

Z j j Z j j X

p c p y p

 

 

 

  • The actual and predicted

parities are computed and compared in each clock cycle to generate the output error signal.

slide-12
SLIDE 12

CRC-Based Fault Detection Scheme

12

  • We extend the idea from single bit to multiple bits.
  • The Cyclic Redundancy Check (CRC) code has

been adopted to detect errors in the GCM loop.

  • For 𝑙 parity bits, the CRC generator polynomial

must be of degree 𝑙: 𝑕𝑙 𝑦 = 𝑦𝑙 + … + 𝑕1𝑦 + 1.

  • Let us denote the output of the multiplier in the

GCM loop as the message: 𝑛 𝑦 = 𝑌i(𝑦)

1. Compute actual k-bit parity: 𝑞 𝑦 = 𝑛 𝑦 𝑛𝑝𝑒 𝑕k(𝑦) 2. Compute k-bit predicted parity: Ƹ 𝑞 𝑦 = 𝑔 𝐷, 𝐼, 𝑍 .

  • 3. Compare them to detect

error: if 𝑞 𝑦 ≠ Ƹ 𝑞 𝑦 ⇒ eout=1.

slide-13
SLIDE 13

Matrix-Based CRC Formulations

13

  • 1. The k parity bits of the multiplier output are computed as

pCRC-k=[p0p1 … pk-1]=[m0m1 … m127]GCRC-k.

  • mj ∈ {0 ,1} is the j-th coordinate of the multiplier output 𝑌𝑗.
  • GCRC-k is the 128 × 𝑙 CRC generator matrix.
  • The 𝑘-th row, 0 ≤ 𝑘 ≤ 127, of GCRC-k contain coefficients
  • f 𝑦𝑘 𝑛𝑝𝑒 𝑕k 𝑦 .
  • For 𝑙 = 1 (single bit parity), 𝑕1 𝑦 = 𝑦 + 1 and then

GCRC-1=[1 1 … 1 ]T ⇒ p=m0+m1 +…+m127

  • For 2 ≤ 𝑙 ≤ 4 ⇒
slide-14
SLIDE 14

Matrix-Based CRC Formulations (cont.)

14

  • 2. To calculate k predicted parity bits, we use the Mastrovito

formulation for the multiplier output as m=[m0m1… m127]T=Ed

  • The entries of E contain coordinates of 𝐼 only.
  • d=y+c is a vector with the coordinates of 𝐸𝑗 = 𝑍𝑗 + 𝐷i
  • Substituting mT=dTET into pCRC-k=mTGCRC-k, we obtain

ෝ 𝒒CRC-k = [ Ƹ 𝑞0 Ƹ 𝑞1 … Ƹ 𝑞k-1] =dTETGCRC-k =yTOCRC-k+cTOCRC-k

  • The entries of

OCRC-k =ETGCRC-k are functions of 𝐼 only.

  • They are stored into k

128-bit registers at the initialization phase.

slide-15
SLIDE 15

15

Matrix-Based CRC Formulations (cont.)

  • 3. After calculations of [p0p1 … pk-1] and [ Ƹ

𝑞0 Ƹ 𝑞1 … Ƹ 𝑞k-1], we compare all 𝑙 actual parities with the corresponding predicted parities to generate the output error signal eout = (p0+ Ƹ 𝑞0) ∨ (p1+ Ƹ 𝑞1) ∨ … ∨ (pk-1+ Ƹ 𝑞k-1)

  • It requires 𝑙 2-input

XOR gates and a k- input OR gate.

slide-16
SLIDE 16

16

CRC-Based Fault Detection Architecture

ෝ 𝒒CRC-k = [ Ƹ 𝑞0 Ƹ 𝑞1 … Ƹ 𝑞k-1]=yTOCRC-k+cTOCRC-k pCRC-k=[p0p1 … pk-1]=[m0m1 … m127]GCRC-k eout =(p0+ Ƹ 𝑞0) ∨(p1+ Ƹ 𝑞1) ∨…∨(pk-1+ Ƹ 𝑞k-1)

  • The actual and

predicted parities are computed and compared in each clock cycle to generate the output error signal.

slide-17
SLIDE 17

Fault Simulation Results

17

  • We have written a VHDL code to simulate the entire fault

detection scheme for the GCM using ModelSim.

  • We have considered up to degree six for the CRC

generator polynomials.

  • Different cases of single and multiple bit faults (300,000

in total) are injected into different modules of the proposed fault detection architecture.

  • By increasing number of parity bits, fault coverage

increases and can reach to 100% with acceptable false alarm.

slide-18
SLIDE 18

FPGA Implementations and Overheads

18

  • We have implemented the original GCM and six fault

detection architectures on Altera’s 28 nm FPGA.

  • Their areas in terms number of ALM (Adaptive Logic

Module) and longest delays are recorded.

  • The area and time overheads of the fault detection

schemes are presented as compared to the original one.

  • For fault coverage of 98% (k=6), we have area
  • verhead of 10.9% and delay of 23%.
slide-19
SLIDE 19

Conclusion

 We proposed a reliable GCM scheme capable of detecting

permanent and transient faults.

 The proposed fault detection scheme checks the validity of

the GCM computation in every clock cycle.

 Based on available overheads and/or required fault

coverage, number of parity bits (and hence the CRC generator polynomial) can be selected.

 We performed fault simulation and FPGA implementations  We considered single and multiple faults in all locations of

the GCM, parity generation and predicted modules.

 The proposed fault detection scheme has high fault

coverage with low overheads and negligible false alarm.

19

slide-20
SLIDE 20

Thank You & Questions?

20