XKCP internals Gilles Van Assche 1 1 STMicroelectronics SCA workshop - - PowerPoint PPT Presentation

xkcp internals
SMART_READER_LITE
LIVE PREVIEW

XKCP internals Gilles Van Assche 1 1 STMicroelectronics SCA workshop - - PowerPoint PPT Presentation

XKCP internals Gilles Van Assche 1 1 STMicroelectronics SCA workshop ibenik, Croatia, June 2019 Based on joint work with Ronny Van Keer 1 / 19 Outline 1 Introduction 2 Inside the XKCP 3 Below SnP and PlSnP 4 Build system 2 / 19


slide-1
SLIDE 1

XKCP internals

Gilles Van Assche1

1STMicroelectronics

SCA workshop Šibenik, Croatia, June 2019 Based on joint work with Ronny Van Keer

1 / 19

slide-2
SLIDE 2

Outline

1

Introduction

2

Inside the XKCP

3

Below SnP and PlSnP

4

Build system

2 / 19

slide-3
SLIDE 3

Introduction

Outline

1

Introduction

2

Inside the XKCP

3

Below SnP and PlSnP

4

Build system

3 / 19

slide-4
SLIDE 4

Introduction

What is the XKCP?

Previously known as the Keccak Code Package… Repository of implementations of Keccak-p[200 to 1600], SHA-3, (c)SHAKE, KMAC, KangarooTwelve, Ketje, Keyak, Kravatte, … Xoodoo, Xoodyak, Xoofff, Xoofff-SANE, … … now eXtended KCP

4 / 19

slide-5
SLIDE 5

Introduction

What is the XKCP?

Previously known as the Keccak Code Package… Repository of implementations of Keccak-p[200 to 1600], SHA-3, (c)SHAKE, KMAC, KangarooTwelve, Ketje, Keyak, Kravatte, … Xoodoo, Xoodyak, Xoofff, Xoofff-SANE, … … now eXtended KCP

4 / 19

slide-6
SLIDE 6

Introduction

What is the XKCP?

Previously known as the Keccak Code Package… Repository of implementations of Keccak-p[200 to 1600], SHA-3, (c)SHAKE, KMAC, KangarooTwelve, Ketje, Keyak, Kravatte, … Xoodoo, Xoodyak, Xoofff, Xoofff-SANE, … … now eXtended KCP

4 / 19

slide-7
SLIDE 7

Introduction

What is the XKCP?

Previously known as the Keccak Code Package… Repository of implementations of Keccak-p[200 to 1600], SHA-3, (c)SHAKE, KMAC, KangarooTwelve, Ketje, Keyak, Kravatte, … Xoodoo, Xoodyak, Xoofff, Xoofff-SANE, … … now eXtended KCP

4 / 19

slide-8
SLIDE 8

Introduction

Where to fjnd it

https://github.com/XKCP/XKCP

5 / 19

slide-9
SLIDE 9

Inside the XKCP

Outline

1

Introduction

2

Inside the XKCP

3

Below SnP and PlSnP

4

Build system

6 / 19

slide-10
SLIDE 10

Inside the XKCP

A layered approach

Keccak-p[200] Keccak-p[1600] Xoodoo

Primitive

Sponge Duplex

Construction

Hashing MAC PRNG

  • Auth. Enc.

Mode Generic focus on user

easy to use e.g., message queue

  • ne implementation

pointers and arithmetic

Specifjc focus on developer

limited scope to optimize unit tests

tailored implementations

permutation bulk data processing

7 / 19

slide-11
SLIDE 11

Inside the XKCP

A layered approach

Keccak-p[200] Keccak-p[1600] Xoodoo

Primitive

Sponge Duplex

Construction

Hashing MAC PRNG

  • Auth. Enc.

Mode Generic focus on user

easy to use e.g., message queue

  • ne implementation

pointers and arithmetic

Specifjc focus on developer

limited scope to optimize unit tests

tailored implementations

permutation bulk data processing

7 / 19

slide-12
SLIDE 12

Inside the XKCP

A layered approach

Keccak-p[200] Keccak-p[1600] Xoodoo

Primitive

Sponge Duplex

Construction

Hashing MAC PRNG

  • Auth. Enc.

Mode Generic focus on user

easy to use e.g., message queue

  • ne implementation

pointers and arithmetic

Specifjc focus on developer

limited scope to optimize unit tests

tailored implementations

permutation bulk data processing

7 / 19

slide-13
SLIDE 13

Inside the XKCP

A layered approach

Keccak-p[200] Keccak-p[1600] Xoodoo

Primitive

Sponge Duplex

Construction

Hashing MAC PRNG

  • Auth. Enc.

Mode

SnP

Generic focus on user

easy to use e.g., message queue

  • ne implementation

pointers and arithmetic

Specifjc focus on developer

limited scope to optimize unit tests

tailored implementations

permutation bulk data processing

7 / 19

slide-14
SLIDE 14

Inside the XKCP

Parallel processing

4×Keccak-p[1600] 8×Keccak-p[1600] 16×Xoodoo

Primitive

Parallel Sponges Farfalle

Construction

KangarooTwelve Deck-SANE Deck-WBC

Mode SnP = State and Permutation PlSnP = Parallel States and Permutations

8 / 19

slide-15
SLIDE 15

Inside the XKCP

Parallel processing

4×Keccak-p[1600] 8×Keccak-p[1600] 16×Xoodoo

Primitive

Parallel Sponges Farfalle

Construction

KangarooTwelve Deck-SANE Deck-WBC

Mode

PlSnP

SnP = State and Permutation PlSnP = Parallel States and Permutations

8 / 19

slide-16
SLIDE 16

Below SnP and PlSnP

Outline

1

Introduction

2

Inside the XKCP

3

Below SnP and PlSnP

4

Build system

9 / 19

slide-17
SLIDE 17

Below SnP and PlSnP

Multiple implementations

Keccak-p[1600]

  • pt32
  • pt64

AVX2 AVX512 ARMv7M ARMv8A 2×Keccak-p[1600] fallback AVX2 AVX512 NEON Xoodoo

  • pt32

ARMv6M ARMv7M ARMv7A AVR8 4×Xoodoo fallback AVX2 AVX512 NEON

SnP or PlSnP

Assumption: the state representation is opaque!

10 / 19

slide-18
SLIDE 18

Below SnP and PlSnP

Multiple implementations

Keccak-p[1600]

  • pt32
  • pt64

AVX2 AVX512 ARMv7M ARMv8A 2×Keccak-p[1600] fallback AVX2 AVX512 NEON Xoodoo

  • pt32

ARMv6M ARMv7M ARMv7A AVR8 4×Xoodoo fallback AVX2 AVX512 NEON

SnP or PlSnP

Assumption: the state representation is opaque!

10 / 19

slide-19
SLIDE 19

Below SnP and PlSnP

Operations on the state

initialize the state apply the permutation f XOR/overwrite bytes into the state extract bytes from the state

and optionally XOR them

11 / 19

slide-20
SLIDE 20

Below SnP and PlSnP

Example for Keccak-p[1600]

Declarations in KeccakP-1600-SnP.h: #define KeccakP1600_implementation “generic 64-bit optimized implementation” #define KeccakP1600_stateSizeInBytes 200 #define KeccakP1600_stateAlignment 8 Typical functions to be implemented (or macro’ed): KeccakP1600_Initialize KeccakP1600_AddBytes KeccakP1600_OverwriteBytes KeccakP1600_Permute_Nrounds KeccakP1600_ExtractBytes KeccakP1600_ExtractAndAddBytes

12 / 19

slide-21
SLIDE 21

Below SnP and PlSnP

Example for Keccak-p[1600]

Declarations in KeccakP-1600-SnP.h: #define KeccakP1600_implementation “generic 64-bit optimized implementation” #define KeccakP1600_stateSizeInBytes 200 #define KeccakP1600_stateAlignment 8 Typical functions to be implemented (or macro’ed): KeccakP1600_Initialize KeccakP1600_AddBytes KeccakP1600_OverwriteBytes KeccakP1600_Permute_Nrounds KeccakP1600_ExtractBytes KeccakP1600_ExtractAndAddBytes

12 / 19

slide-22
SLIDE 22

Below SnP and PlSnP

Example for Keccak-p[1600]

Declarations in KeccakP-1600-SnP.h: #define KeccakP1600_implementation “generic 64-bit optimized implementation” #define KeccakP1600_stateSizeInBytes 200 #define KeccakP1600_stateAlignment 8 Typical functions to be implemented (or macro’ed): KeccakP1600_Initialize KeccakP1600_AddBytes KeccakP1600_OverwriteBytes KeccakP1600_Permute_Nrounds KeccakP1600_ExtractBytes KeccakP1600_ExtractAndAddBytes

12 / 19

slide-23
SLIDE 23

Below SnP and PlSnP

Example for Keccak-p[1600]

Declarations in KeccakP-1600-SnP.h: #define KeccakP1600_implementation “generic 64-bit optimized implementation” #define KeccakP1600_stateSizeInBytes 200 #define KeccakP1600_stateAlignment 8 Typical functions to be implemented (or macro’ed): KeccakP1600_Initialize KeccakP1600_AddBytes KeccakP1600_OverwriteBytes KeccakP1600_Permute_Nrounds KeccakP1600_ExtractBytes KeccakP1600_ExtractAndAddBytes

12 / 19

slide-24
SLIDE 24

Below SnP and PlSnP

Example for Keccak-p[1600]

Declarations in KeccakP-1600-SnP.h: #define KeccakP1600_implementation “generic 64-bit optimized implementation” #define KeccakP1600_stateSizeInBytes 200 #define KeccakP1600_stateAlignment 8 Typical functions to be implemented (or macro’ed): KeccakP1600_Initialize KeccakP1600_AddBytes KeccakP1600_OverwriteBytes KeccakP1600_Permute_Nrounds KeccakP1600_ExtractBytes KeccakP1600_ExtractAndAddBytes

12 / 19

slide-25
SLIDE 25

Below SnP and PlSnP

Fast loop optimization

Specialized repeated application of some operations (optional)

13 / 19

slide-26
SLIDE 26

Below SnP and PlSnP

Parallel operations on the states

f f f

f f f

f f f

Functions on individual instances Functions on all instances

Parallel application of f XOR blocks into state

Optional fast loop optimization

14 / 19

slide-27
SLIDE 27

Below SnP and PlSnP

Parallel operations on the states

f f f

f f f

f f f

Functions on individual instances Functions on all instances

Parallel application of f XOR blocks into state

Optional fast loop optimization

14 / 19

slide-28
SLIDE 28

Below SnP and PlSnP

Parallel operations on the states

f f f

f f f

f f f

Functions on individual instances Functions on all instances

Parallel application of f XOR blocks into state

Optional fast loop optimization

14 / 19

slide-29
SLIDE 29

Build system

Outline

1

Introduction

2

Inside the XKCP

3

Below SnP and PlSnP

4

Build system

15 / 19

slide-30
SLIDE 30

Build system

Making targets

Making a library (.a or .so)

make generic64/libXKCP.a make generic32/libXKCP.a make Skylake/libXKCP.a make ARMv7A/libXKCP.a make compact/libXKCP.a

Extracting the source fjles

make generic64/libXKCP.a.pack

Running the unit tests

make generic64/UnitTests UnitTests --SnP --KangarooTwelve --Xoofff

And more: benchmarks, KeccakSum utility

16 / 19

slide-31
SLIDE 31

Build system

Making targets

Making a library (.a or .so)

make generic64/libXKCP.a make generic32/libXKCP.a make Skylake/libXKCP.a make ARMv7A/libXKCP.a make compact/libXKCP.a

Extracting the source fjles

make generic64/libXKCP.a.pack

Running the unit tests

make generic64/UnitTests UnitTests --SnP --KangarooTwelve --Xoofff

And more: benchmarks, KeccakSum utility

16 / 19

slide-32
SLIDE 32

Build system

Making targets

Making a library (.a or .so)

make generic64/libXKCP.a make generic32/libXKCP.a make Skylake/libXKCP.a make ARMv7A/libXKCP.a make compact/libXKCP.a

Extracting the source fjles

make generic64/libXKCP.a.pack

Running the unit tests

make generic64/UnitTests UnitTests --SnP --KangarooTwelve --Xoofff

And more: benchmarks, KeccakSum utility

16 / 19

slide-33
SLIDE 33

Build system

Making targets

Making a library (.a or .so)

make generic64/libXKCP.a make generic32/libXKCP.a make Skylake/libXKCP.a make ARMv7A/libXKCP.a make compact/libXKCP.a

Extracting the source fjles

make generic64/libXKCP.a.pack

Running the unit tests

make generic64/UnitTests UnitTests --SnP --KangarooTwelve --Xoofff

And more: benchmarks, KeccakSum utility

16 / 19

slide-34
SLIDE 34

Build system

XML-driven Makefjle

Makele.build

HighLevel.build LowLevel.build HOWTO-Customize.build

Target makele

In *.build, one defjnes fragments that … … are sets of fjles and compilation options … represent

a concrete service, mode, construction the implementation of a permutation

17 / 19

slide-35
SLIDE 35

Build system

Customizing targets

HOWTO-Customize.build Examples: <target name=“XoodyakCortexM3”/> <target name=“K12onAVX2”/>

18 / 19

slide-36
SLIDE 36

Build system

Customizing targets

HOWTO-Customize.build Examples: <target name=“XoodyakCortexM3”/> <target name=“K12onAVX2”/>

18 / 19

slide-37
SLIDE 37

Build system

Customizing targets

HOWTO-Customize.build Examples: <target name=“XoodyakCortexM3” inherits=“Xoodyak”/> <target name=“K12onAVX2”/>

18 / 19

slide-38
SLIDE 38

Build system

Customizing targets

HOWTO-Customize.build Examples: <target name=“XoodyakCortexM3” inherits=“Xoodyak

  • ptimizedARMv7M”/>

<target name=“K12onAVX2”/>

18 / 19

slide-39
SLIDE 39

Build system

Customizing targets

HOWTO-Customize.build Examples: <target name=“XoodyakCortexM3” inherits=“Xoodyak

  • ptimizedARMv7M”/>

<target name=“K12onAVX2”/>

18 / 19

slide-40
SLIDE 40

Build system

Customizing targets

HOWTO-Customize.build Examples: <target name=“XoodyakCortexM3” inherits=“Xoodyak

  • ptimizedARMv7M”/>

<target name=“K12onAVX2” inherits=“KangarooTwelve”/>

18 / 19

slide-41
SLIDE 41

Build system

Customizing targets

HOWTO-Customize.build Examples: <target name=“XoodyakCortexM3” inherits=“Xoodyak

  • ptimizedARMv7M”/>

<target name=“K12onAVX2” inherits=“KangarooTwelve optimized1600AVX2”/>

18 / 19

slide-42
SLIDE 42

Build system

Customizing targets

HOWTO-Customize.build Examples: <target name=“XoodyakCortexM3” inherits=“Xoodyak

  • ptimizedARMv7M”/>

<target name=“K12onAVX2” inherits=“KangarooTwelve

  • ptimized1600AVX2 SIMD256-AVX2u12”/>

18 / 19

slide-43
SLIDE 43

Conclusions

Any questions?

Thanks for your attention!

https://keccak.team/

19 / 19