[PPT] - Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash PowerPoint Presentation

SLIDE 1

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh

Presented by: Arash Reyhani-Masoleh Department of Electrical and Computer Engineering Western University, London, Ontario, Canada 23rd IEEE Symposium on Computer Arithmetic (ARITH 23) June 11, 2016

SLIDE 2

Outline

Motivation Preliminaries Single-bit Fault Detection Scheme CRC-based Fault Detection Scheme Fault Simulation Results FPGA Implementations and Overheads Conclusion

2

SLIDE 3

Motivations: GCM

 Galois/Counter Mode (GCM) is a recently adopted mode

f operation for symmetric key cryptography (like AES).

 Proposed by McGrew and Viega in 2005 and was

defined by NIST (SP 800-38D) in 2007.

 AES-GCM is included in “NSA Suite B Cryptography”.  It is being used in a number of protocols and standards:

 IEEE 802.1AE, IEEE 802.11 AD  ANSI (INCITS) Fiber Channel Security Protocols (FC-SP).  IEEE P1619.1 tape storage, IETF IPsec standards, SSH

and TLS 1.2.

 It provides authentication assurance for additional data

that is not encrypted.

 It detects accidental modifications of data, unauthorized

alterations, and protects confidentiality.

3

SLIDE 4

Motivations: Reliable GCM

 Sources of faults in cryptographic systems:

 Natural Faults  Fault Attacks: inject faults and look for leakage of

information.

 The need for fault detection method

 Protect the integrity and authenticity of data  Prevent the attack sequence in case of fault attack.

 In this paper, we propose a reliable GCM scheme to

detect both permanent and transient faults.

 Low overhead in terms of area and delay.  Acceptable fault coverage.

4

SLIDE 5

Preliminaries

 The GCM has two operations: authenticated encryption

and authenticated decryption.

 There are 4 inputs for authenticated encryption:

1.

A secret key (K) with the length based on the block cipher.

2.

An initialization vector (IV) between 1 and 264.

3.

A plaintext (P) with any number of bits between 0 and 239 − 256

4.

An additional authenticated data (A), which is authenticated but not encrypted, with any number of bits between 0 and 264.

 There are two outputs for authenticated encryption:

1.

A ciphertext (C) whose length is exactly that of the plaintext.

2.

An authentication tag (T), whose length can be any value between 0 and 128.

5

SLIDE 6

AES-GCM Block Diagram

6

The “Hash Key” H is generated by the encryption of 128

bits of zero using the symmetric key (K): H = E(K,0128)=EK(0)

The Plaintext P is

divided into n blocks

f 128-bit long:

P1, P2, . . . , Pn

An up-counter with the output Ui is used to generate

blocks of ciphertext: Ci=Pi ⊕ EK(Ui) for i=1, 2, …, n.

The Additional

Authenticated Data A is represented as m blocks of 128 bits: A1, A2, . . . , Am

SLIDE 7

AES-GCM Block Diagram (cont.)

7

Using the inputs H, A and C, the output of the GCM

is defined by Xm+n+1 = GHASH (H, A, C), where

The 128-bit register Y
Cleared initially.
After the (m+n+1)th

clock cycle, it contains

Xm+n+1 = GHASH (H, A, C).

In this paper, we

consider the GCM loop.

SLIDE 8

Single-bit Fault Detection Scheme

 The parity of multiplier output (Xi) is computed using

two different functions:

1.

Actual parity (pXi ) is obtained by XORing the coordinates of Xi

8

). , , ( ˆ Y C H f p

i X i 

Then, they are compared to find error:

2. The predicted parity is a

complex function of H, Ci, Y: if 𝑞 ≠ Ƹ 𝑞 ⇒ eout=1.

SLIDE 9

Single-bit Parity Prediction Formulations

We write the multiplier output as follows:

 𝑌𝑗 = 𝐼 × 𝐸𝑗 mod 𝐺(α), where α is the root of

irreducible polynomial F(x)=x128 + x7 + x2 + x + 1 and 0 ≤ 𝑗 ≤ 𝑛 + 𝑜 + 1.

 The hash key 𝐼 ∈ GF(2128) is fixed in each iterations 𝑗.  The field element 𝐸𝑗 = σ𝑘=0

127 𝑒𝑘α𝑘 (drop 𝑗 for simplicity).

 𝑌𝑗= σ𝑘=0

127 𝑒𝑘 𝑎

(𝑘), where 𝑎 𝑘 = (𝐼 α𝑘 )mod 𝐺(α), Z(0)=H.

Then, the parity prediction of multiplier output:

9

. ˆ ˆ

127

) (





j Z j X

j i

p d p

SLIDE 10

Single-bit Parity Prediction Formulations (Cont.)

Since 𝐸 = 𝑍 + 𝐷 ⇒ dj=yj+cj

10

) ( ) (

ˆ ˆ ˆ

127 127

j j i

Z j j Z j j X

p c p y p

 

 

 

) (

ˆ

j

Z

p

, 0 ≤ 𝑘 ≤ 127 , is a binary function and depends on

the coordinates of 𝐼 ∈ 𝐻𝐺 2128 :

𝑎

0 = 𝐼

⇒

𝑎

1 = 𝑎 0 α mod 𝐺 α ⇒

In general:
These values are stored in a register (PH) at the

initialization phase.

They remain constant for the entire 𝑛 + 𝑜 + 1 cycles of

the GCM computation.





127

) (

ˆ ˆ

j Z j X

j i

p d p

⇒ . ˆ

) (

H Z

p p 

. ˆ ˆ

127

) ( ) 1 (

h p p

Z Z

  . 127 1 ˆ ˆ

) 1 ( 127

) 1 ( ) (

   



j for z p p

j Z Z

j j

SLIDE 11

Single Parity Fault Detection Architecture

11

. ˆ ˆ ˆ

) ( ) (

127 127

j j i

Z j j Z j j X

p c p y p

 

 

 

The actual and predicted

parities are computed and compared in each clock cycle to generate the output error signal.

SLIDE 12

CRC-Based Fault Detection Scheme

12

We extend the idea from single bit to multiple bits.
The Cyclic Redundancy Check (CRC) code has

been adopted to detect errors in the GCM loop.

For 𝑙 parity bits, the CRC generator polynomial

must be of degree 𝑙: 𝑕𝑙 𝑦 = 𝑦𝑙 + … + 𝑕1𝑦 + 1.

Let us denote the output of the multiplier in the

GCM loop as the message: 𝑛 𝑦 = 𝑌i(𝑦)

1. Compute actual k-bit parity: 𝑞 𝑦 = 𝑛 𝑦 𝑛𝑝𝑒 𝑕k(𝑦) 2. Compute k-bit predicted parity: Ƹ 𝑞 𝑦 = 𝑔 𝐷, 𝐼, 𝑍 .

3. Compare them to detect

error: if 𝑞 𝑦 ≠ Ƹ 𝑞 𝑦 ⇒ eout=1.

SLIDE 13

Matrix-Based CRC Formulations

13

1. The k parity bits of the multiplier output are computed as

pCRC-k=[p0p1 … pk-1]=[m0m1 … m127]GCRC-k.

mj ∈ {0 ,1} is the j-th coordinate of the multiplier output 𝑌𝑗.
GCRC-k is the 128 × 𝑙 CRC generator matrix.
The 𝑘-th row, 0 ≤ 𝑘 ≤ 127, of GCRC-k contain coefficients
f 𝑦𝑘 𝑛𝑝𝑒 𝑕k 𝑦 .
For 𝑙 = 1 (single bit parity), 𝑕1 𝑦 = 𝑦 + 1 and then

GCRC-1=[1 1 … 1 ]T ⇒ p=m0+m1 +…+m127

For 2 ≤ 𝑙 ≤ 4 ⇒

SLIDE 14

Matrix-Based CRC Formulations (cont.)

14

2. To calculate k predicted parity bits, we use the Mastrovito

formulation for the multiplier output as m=[m0m1… m127]T=Ed

The entries of E contain coordinates of 𝐼 only.
d=y+c is a vector with the coordinates of 𝐸𝑗 = 𝑍𝑗 + 𝐷i
Substituting mT=dTET into pCRC-k=mTGCRC-k, we obtain

ෝ 𝒒CRC-k = [ Ƹ 𝑞0 Ƹ 𝑞1 … Ƹ 𝑞k-1] =dTETGCRC-k =yTOCRC-k+cTOCRC-k

The entries of

OCRC-k =ETGCRC-k are functions of 𝐼 only.

They are stored into k

128-bit registers at the initialization phase.

SLIDE 15

15

Matrix-Based CRC Formulations (cont.)

3. After calculations of [p0p1 … pk-1] and [ Ƹ

𝑞0 Ƹ 𝑞1 … Ƹ 𝑞k-1], we compare all 𝑙 actual parities with the corresponding predicted parities to generate the output error signal eout = (p0+ Ƹ 𝑞0) ∨ (p1+ Ƹ 𝑞1) ∨ … ∨ (pk-1+ Ƹ 𝑞k-1)

It requires 𝑙 2-input

XOR gates and a k- input OR gate.

SLIDE 16

16

CRC-Based Fault Detection Architecture

ෝ 𝒒CRC-k = [ Ƹ 𝑞0 Ƹ 𝑞1 … Ƹ 𝑞k-1]=yTOCRC-k+cTOCRC-k pCRC-k=[p0p1 … pk-1]=[m0m1 … m127]GCRC-k eout =(p0+ Ƹ 𝑞0) ∨(p1+ Ƹ 𝑞1) ∨…∨(pk-1+ Ƹ 𝑞k-1)

The actual and

predicted parities are computed and compared in each clock cycle to generate the output error signal.

SLIDE 17

Fault Simulation Results

17

We have written a VHDL code to simulate the entire fault

detection scheme for the GCM using ModelSim.

We have considered up to degree six for the CRC

generator polynomials.

Different cases of single and multiple bit faults (300,000

in total) are injected into different modules of the proposed fault detection architecture.

By increasing number of parity bits, fault coverage

increases and can reach to 100% with acceptable false alarm.

SLIDE 18

FPGA Implementations and Overheads

18

We have implemented the original GCM and six fault

detection architectures on Altera’s 28 nm FPGA.

Their areas in terms number of ALM (Adaptive Logic

Module) and longest delays are recorded.

The area and time overheads of the fault detection

schemes are presented as compared to the original one.

For fault coverage of 98% (k=6), we have area
verhead of 10.9% and delay of 23%.

SLIDE 19

Conclusion

 We proposed a reliable GCM scheme capable of detecting

permanent and transient faults.

 The proposed fault detection scheme checks the validity of

the GCM computation in every clock cycle.

 Based on available overheads and/or required fault

coverage, number of parity bits (and hence the CRC generator polynomial) can be selected.

 We performed fault simulation and FPGA implementations  We considered single and multiple faults in all locations of

the GCM, parity generation and predicted modules.

 The proposed fault detection scheme has high fault

coverage with low overheads and negligible false alarm.

19

SLIDE 20

Thank You & Questions?

20