Speed, speed, speed $1000 TCR hashing competition D. J. Bernstein - - PowerPoint PPT Presentation

speed speed speed 1000 tcr hashing competition d j
SMART_READER_LITE
LIVE PREVIEW

Speed, speed, speed $1000 TCR hashing competition D. J. Bernstein - - PowerPoint PPT Presentation

1 2 Speed, speed, speed $1000 TCR hashing competition D. J. Bernstein Crowley: I have a problem where I need to make some University of Illinois at Chicago; cryptography faster, and Im Ruhr University Bochum setting up a $1000


slide-1
SLIDE 1

1

Speed, speed, speed

  • D. J. Bernstein

University of Illinois at Chicago; Ruhr University Bochum Reporting some recent symmetric-speed discussions, especially from RWC 2020. Not included in this talk:

  • NISTLWC.
  • Short inputs.
  • FHE/MPC ciphers.

2

$1000 TCR hashing competition Crowley: “I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution.” Not fast enough: Signing H(M), where M is a long message. “[On a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb : : : BLAKE2b is nearly twice as fast : : : However, this is still a lot slower than I’m happy with.”

slide-2
SLIDE 2

1

eed, speed, speed Bernstein University of Illinois at Chicago; University Bochum rting some recent symmetric-speed discussions, ecially from RWC 2020. included in this talk: NISTLWC. rt inputs. FHE/MPC ciphers.

2

$1000 TCR hashing competition Crowley: “I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution.” Not fast enough: Signing H(M), where M is a long message. “[On a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb : : : BLAKE2b is nearly twice as fast : : : However, this is still a lot slower than I’m happy with.” Instead cho and sign Note that not full collision Does this TCR bre

slide-3
SLIDE 3

1

eed Illinois at Chicago; Bochum recent discussions, WC 2020. this talk: ciphers.

2

$1000 TCR hashing competition Crowley: “I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution.” Not fast enough: Signing H(M), where M is a long message. “[On a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb : : : BLAKE2b is nearly twice as fast : : : However, this is still a lot slower than I’m happy with.” Instead choose random and sign (R; H(R; Note that H needs not full collision resistance. Does this allow faster TCR breaks how many

slide-4
SLIDE 4

1

Chicago; discussions, 2020.

2

$1000 TCR hashing competition Crowley: “I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution.” Not fast enough: Signing H(M), where M is a long message. “[On a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb : : : BLAKE2b is nearly twice as fast : : : However, this is still a lot slower than I’m happy with.” Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds?

slide-5
SLIDE 5

2

$1000 TCR hashing competition Crowley: “I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution.” Not fast enough: Signing H(M), where M is a long message. “[On a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb : : : BLAKE2b is nearly twice as fast : : : However, this is still a lot slower than I’m happy with.”

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds?

slide-6
SLIDE 6

2

$1000 TCR hashing competition Crowley: “I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution.” Not fast enough: Signing H(M), where M is a long message. “[On a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb : : : BLAKE2b is nearly twice as fast : : : However, this is still a lot slower than I’m happy with.”

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.”

slide-7
SLIDE 7

2

$1000 TCR hashing competition Crowley: “I have a problem where I need to make some cryptography faster, and I’m setting up a $1000 competition funded from my own pocket for work towards the solution.” Not fast enough: Signing H(M), where M is a long message. “[On a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb : : : BLAKE2b is nearly twice as fast : : : However, this is still a lot slower than I’m happy with.”

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security.

slide-8
SLIDE 8

2

TCR hashing competition wley: “I have a problem I need to make some cryptography faster, and I’m up a $1000 competition from my own pocket for towards the solution.” fast enough: Signing H(M), M is a long message. a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb : : : BLAKE2b is nearly twice as : However, this is still a wer than I’m happy with.”

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security. Aumasson, 70%, 23%, 50%, 8%, AES-128/B2b/ChaCha20/SHA-3 are “brok “Inconsistent

slide-9
SLIDE 9

2

hashing competition have a problem make some faster, and I’m $1000 competition

  • wn pocket for

the solution.” enough: Signing H(M), long message. Cortex-A7 28.86 cpb : : : rly twice as ever, this is still a I’m happy with.”

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security. Aumasson, “Too much 70%, 23%, 35%, 21% 50%, 8%, 25%, 20% AES-128/B2b/ChaCha20/SHA-3 are “broken” or “p “Inconsistent securit

slide-10
SLIDE 10

2

etition roblem some I’m etition et for solution.” H(M), message. rtex-A7 : : : as still a with.”

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security. Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds 50%, 8%, 25%, 20% rounds AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically “Inconsistent security margins”.

slide-11
SLIDE 11

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security.

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”.

slide-12
SLIDE 12

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security.

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”.

slide-13
SLIDE 13

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security.

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”. “Thousands of papers, stagnating results and techniques”.

slide-14
SLIDE 14

3

Instead choose random R and sign (R; H(R; M)). Note that H needs only “TCR”, not full collision resistance. Does this allow faster H design? TCR breaks how many rounds? “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security.

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”. “Thousands of papers, stagnating results and techniques”. “What we want: More scientific and rational approach to choosing round numbers, tolerance for corrections”.

slide-15
SLIDE 15

3

Instead choose random R sign (R; H(R; M)). that H needs only “TCR”, full collision resistance. this allow faster H design? reaks how many rounds? r as I know, no-one ever proposed a TCR as a rimitive, designed to be faster existing hash functions, that’s what I need.” desiderata: tree hash, eak at each vertex, multi-message security.

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”. “Thousands of papers, stagnating results and techniques”. “What we want: More scientific and rational approach to choosing round numbers, tolerance for corrections”. New BLAKE3 7-round parallel X “Much faster SHA-2, SHA-3,

slide-16
SLIDE 16

3

random R ; M)). needs only “TCR”, resistance. faster H design? many rounds? w, no-one

  • sed a TCR as a

designed to be faster hash functions, I need.” desiderata: tree hash, each vertex, security.

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”. “Thousands of papers, stagnating results and techniques”. “What we want: More scientific and rational approach to choosing round numbers, tolerance for corrections”. New BLAKE3 hash 7-round BLAKE2s parallel XOF + mo “Much faster than SHA-2, SHA-3, and

slide-17
SLIDE 17

3

“TCR”, resistance. design? rounds? as a faster functions, hash,

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”. “Thousands of papers, stagnating results and techniques”. “What we want: More scientific and rational approach to choosing round numbers, tolerance for corrections”. New BLAKE3 hash function 7-round BLAKE2s + tree mo parallel XOF + more changes. “Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.”

slide-18
SLIDE 18

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”. “Thousands of papers, stagnating results and techniques”. “What we want: More scientific and rational approach to choosing round numbers, tolerance for corrections”.

5

New BLAKE3 hash function = 7-round BLAKE2s + tree mode, parallel XOF + more changes. “Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.”

slide-19
SLIDE 19

4

Aumasson, “Too much crypto” 70%, 23%, 35%, 21% rounds or 50%, 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”. “Thousands of papers, stagnating results and techniques”. “What we want: More scientific and rational approach to choosing round numbers, tolerance for corrections”.

5

New BLAKE3 hash function = 7-round BLAKE2s + tree mode, parallel XOF + more changes. “Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.” Crowley: “Android disk crypto is always right up against the wall

  • f acceptable speed (and battery

use). Adiantum uses ChaCha12 and is still IMHO too slow. [10.6 Cortex-A7 cycles/byte.] It sometimes seems like no-one in the crypto world feels the user’s pain here; it always looks better to call for more rounds.”

slide-20
SLIDE 20

4

Aumasson, “Too much crypto” 23%, 35%, 21% rounds or 8%, 25%, 20% rounds of AES-128/B2b/ChaCha20/SHA-3 roken” or “practically broken”. “Inconsistent security margins”. ttacks don’t really get better”. “Thousands of papers, stagnating and techniques”. “What we want: More scientific and rational approach

  • sing round numbers,

tolerance for corrections”.

5

New BLAKE3 hash function = 7-round BLAKE2s + tree mode, parallel XOF + more changes. “Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.” Crowley: “Android disk crypto is always right up against the wall

  • f acceptable speed (and battery

use). Adiantum uses ChaCha12 and is still IMHO too slow. [10.6 Cortex-A7 cycles/byte.] It sometimes seems like no-one in the crypto world feels the user’s pain here; it always looks better to call for more rounds.” Huge influence Intel cycles #1 #2 0.37 0.68 0.38 0.88 0.38 0.89 1.94 1.90 0.77 0.98 0.74 0.95 0.77 1.01 0.77 1.03 1.71 1.29

slide-21
SLIDE 21

4

much crypto” 35%, 21% rounds or 20% rounds of AES-128/B2b/ChaCha20/SHA-3 “practically broken”. security margins”. really get better”. papers, stagnating techniques”. ant: More rational approach round numbers, rrections”.

5

New BLAKE3 hash function = 7-round BLAKE2s + tree mode, parallel XOF + more changes. “Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.” Crowley: “Android disk crypto is always right up against the wall

  • f acceptable speed (and battery

use). Adiantum uses ChaCha12 and is still IMHO too slow. [10.6 Cortex-A7 cycles/byte.] It sometimes seems like no-one in the crypto world feels the user’s pain here; it always looks better to call for more rounds.” Huge influence of CPU. Intel cycles/byte fo #1 #2 Intel microa 0.37 0.68 2018 Cannon 0.38 0.88 2017 Cascade 0.38 0.89 2017 Skylak 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kab 0.74 0.95 2015 Skylak 0.77 1.01 2014 Broadw 0.77 1.03 2013 Hasw 1.71 1.29 2012 Ivy

slide-22
SLIDE 22

4

crypto” rounds or rounds of AES-128/B2b/ChaCha20/SHA-3 ractically broken”. rgins”. etter”. gnating roach ers,

5

New BLAKE3 hash function = 7-round BLAKE2s + tree mode, parallel XOF + more changes. “Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.” Crowley: “Android disk crypto is always right up against the wall

  • f acceptable speed (and battery

use). Adiantum uses ChaCha12 and is still IMHO too slow. [10.6 Cortex-A7 cycles/byte.] It sometimes seems like no-one in the crypto world feels the user’s pain here; it always looks better to call for more rounds.” Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lak 0.38 0.88 2017 Cascade Lak 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge

slide-23
SLIDE 23

5

New BLAKE3 hash function = 7-round BLAKE2s + tree mode, parallel XOF + more changes. “Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.” Crowley: “Android disk crypto is always right up against the wall

  • f acceptable speed (and battery

use). Adiantum uses ChaCha12 and is still IMHO too slow. [10.6 Cortex-A7 cycles/byte.] It sometimes seems like no-one in the crypto world feels the user’s pain here; it always looks better to call for more rounds.”

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge

slide-24
SLIDE 24

5

New BLAKE3 hash function = 7-round BLAKE2s + tree mode, parallel XOF + more changes. “Much faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.” Crowley: “Android disk crypto is always right up against the wall

  • f acceptable speed (and battery

use). Adiantum uses ChaCha12 and is still IMHO too slow. [10.6 Cortex-A7 cycles/byte.] It sometimes seems like no-one in the crypto world feels the user’s pain here; it always looks better to call for more rounds.”

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256.

slide-25
SLIDE 25

5

BLAKE3 hash function = 7-round BLAKE2s + tree mode, rallel XOF + more changes. faster than MD5, SHA-1, SHA-2, SHA-3, and BLAKE2.” wley: “Android disk crypto is right up against the wall acceptable speed (and battery Adiantum uses ChaCha12 still IMHO too slow. Cortex-A7 cycles/byte.] It sometimes seems like no-one in crypto world feels the user’s here; it always looks better for more rounds.”

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256. Deck functions: Keccak te 0.51 cycles/ Deck functions API to mak they “allo

slide-26
SLIDE 26

5

hash function = BLAKE2s + tree mode, more changes. than MD5, SHA-1, and BLAKE2.” “Android disk crypto is against the wall eed (and battery uses ChaCha12 too slow. cycles/byte.] It seems like no-one in feels the user’s ays looks better rounds.”

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256. Deck functions: e.g., Keccak team says: 0.51 cycles/byte on Deck functions are API to make modes they “allow efficient

slide-27
SLIDE 27

5

function = mode, changes. SHA-1, BLAKE2.” crypto is the wall battery ChaCha12 w. yte.] It no-one in user’s better

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256. Deck functions: e.g., Xoofff Keccak team says: Xoofff tak 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”.

slide-28
SLIDE 28

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256.

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”.

slide-29
SLIDE 29

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256.

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞.

slide-30
SLIDE 30

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256.

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF.

slide-31
SLIDE 31

6

Huge influence of CPU. Intel cycles/byte for two ciphers: #1 #2 Intel microarchitecture 0.37 0.68 2018 Cannon Lake 0.38 0.88 2017 Cascade Lake 0.38 0.89 2017 Skylake-X 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256.

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc.

slide-32
SLIDE 32

6

influence of CPU. cycles/byte for two ciphers: #2 Intel microarchitecture 0.68 2018 Cannon Lake 0.88 2017 Cascade Lake 0.89 2017 Skylake-X 1.90 2016 Goldmont 0.98 2016 Kaby Lake 0.95 2015 Skylake 1.01 2014 Broadwell 1.03 2013 Haswell 1.29 2012 Ivy Bridge

  • ChaCha12. #2: AES-256.

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc. Deck-Stream:

slide-33
SLIDE 33

6

  • f CPU.

for two ciphers: microarchitecture Cannon Lake Cascade Lake Skylake-X Goldmont Kaby Lake Skylake Broadwell Haswell Ivy Bridge #2: AES-256.

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc. Deck-Stream: Fk(N

slide-34
SLIDE 34

6

ciphers: rchitecture Lake Lake Goldmont e ell Bridge AES-256.

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc. Deck-Stream: Fk(N).

slide-35
SLIDE 35

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc.

8

Deck-Stream: Fk(N).

slide-36
SLIDE 36

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc.

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M).

slide-37
SLIDE 37

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc.

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc.

slide-38
SLIDE 38

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc.

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc. Deck-SANSE: misuse resistance.

slide-39
SLIDE 39

7

Deck functions: e.g., Xoofff Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”. Syntax of deck function: Fk : ({0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc.

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc. Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher. For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo.

slide-40
SLIDE 40

7

functions: e.g., Xoofff Keccak team says: Xoofff takes cycles/byte on Skylake-X. functions are “a new useful make modes trivial”; “allow efficient ciphers”.

  • f deck function:

0; 1}∗)∗ → {0; 1}∞. Security goal: PRF. Efficiency goal: quickly compute substring of Fk(X0), then substring of Fk(X0; X1), then substring of Fk(X0; X1; X2), etc.

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc. Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher. For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo. MAC speed 2014 Bernstein–Chou 29 bit ops using mults (I’ve started bit ops fo

slide-41
SLIDE 41

7

e.g., Xoofff ys: Xoofff takes

  • n Skylake-X.

re “a new useful des trivial”; efficient ciphers”. function: {0; 1}∞. RF. quickly compute X0), then X0; X1), then X0; X1; X2), etc.

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc. Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher. For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo. MAC speed 2014 Bernstein–Chou 29 bit ops per message using mults in field (I’ve started investigating bit ops for integer

slide-42
SLIDE 42

7

  • fff

takes e-X. useful trivial”; ciphers”. . compute then ), etc.

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc. Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher. For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo. MAC speed 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, using mults in field of size 2256 (I’ve started investigating bit ops for integer mults.)

slide-43
SLIDE 43

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc. Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher. For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo.

9

MAC speed 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, using mults in field of size 2256. (I’ve started investigating bit ops for integer mults.)

slide-44
SLIDE 44

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc. Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher. For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo.

9

MAC speed 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, using mults in field of size 2256. (I’ve started investigating bit ops for integer mults.) Encryption sounds slower, but aims for PRF or PRP or SPRP. How many rounds are needed in the context of a MAC?

slide-45
SLIDE 45

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: 128 bits of Fk(N) → tag; use more bits of Fk(N) as stream → ciphertext C1; 128 bits of Fk(N; A1; C1) → tag; etc. Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher. For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo.

9

MAC speed 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, using mults in field of size 2256. (I’ve started investigating bit ops for integer mults.) Encryption sounds slower, but aims for PRF or PRP or SPRP. How many rounds are needed in the context of a MAC? OCB etc. try to skip MAC, but can these modes safely use as few rounds as counter mode?

slide-46
SLIDE 46

8

Deck-Stream: Fk(N). Deck-MAC: 128 bits of Fk(M). Deck-SANE session: bits of Fk(N) → tag; more bits of Fk(N) stream → ciphertext C1; bits of Fk(N; A1; C1) → tag; Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher. eed, the wide-block cipher combines Xoofff and Xoofffie,

  • f) built from Xoodoo.

9

MAC speed 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, using mults in field of size 2256. (I’ve started investigating bit ops for integer mults.) Encryption sounds slower, but aims for PRF or PRP or SPRP. How many rounds are needed in the context of a MAC? OCB etc. try to skip MAC, but can these modes safely use as few rounds as counter mode? Bit operations (assuming key ops/bit 256 54 256 78 128 88 128 100 128 117 256 126 256 144 128 147.2 256 156 128 162.75 128 202.5 256 283.5

slide-47
SLIDE 47

8

(N). bits of Fk(M). ession: ) → tag; Fk(N) ciphertext C1; ; A1; C1) → tag; misuse resistance. wide-block cipher. wide-block cipher and Xoofffie, from Xoodoo.

9

MAC speed 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, using mults in field of size 2256. (I’ve started investigating bit ops for integer mults.) Encryption sounds slower, but aims for PRF or PRP or SPRP. How many rounds are needed in the context of a MAC? OCB etc. try to skip MAC, but can these modes safely use as few rounds as counter mode? Bit operations per (assuming precomputed key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

slide-48
SLIDE 48

8

(M). ; → tag; resistance. cipher. cipher

  • fffie,
  • .

9

MAC speed 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, using mults in field of size 2256. (I’ve started investigating bit ops for integer mults.) Encryption sounds slower, but aims for PRF or PRP or SPRP. How many rounds are needed in the context of a MAC? OCB etc. try to skip MAC, but can these modes safely use as few rounds as counter mode? Bit operations per bit of plaintext (assuming precomputed subk key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 62 ops 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 op 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

slide-49
SLIDE 49

9

MAC speed 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, using mults in field of size 2256. (I’ve started investigating bit ops for integer mults.) Encryption sounds slower, but aims for PRF or PRP or SPRP. How many rounds are needed in the context of a MAC? OCB etc. try to skip MAC, but can these modes safely use as few rounds as counter mode?

10

Bit operations per bit of plaintext (assuming precomputed subkeys): key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 62 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

slide-50
SLIDE 50

9

speed Bernstein–Chou Auth256:

  • ps per message bit,

mults in field of size 2256. started investigating

  • ps for integer mults.)

Encryption sounds slower, but for PRF or PRP or SPRP. many rounds are needed context of a MAC?

  • etc. try to skip MAC,

can these modes safely use rounds as counter mode?

10

Bit operations per bit of plaintext (assuming precomputed subkeys): key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 62 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES More virtues

  • Easy masking.
  • Binary

code-based

  • Integer

lattice-based

  • Use existing
slide-51
SLIDE 51

9

Bernstein–Chou Auth256: message bit, field of size 2256. investigating integer mults.) sounds slower, but PRP or SPRP. rounds are needed

  • f a MAC?

skip MAC, modes safely use counter mode?

10

Bit operations per bit of plaintext (assuming precomputed subkeys): key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 62 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES More virtues of mult-based

  • Easy masking.
  • Binary mults: Sha

code-based crypto.

  • Integer mults: Sha

lattice-based crypto

  • Use existing CPU
slide-52
SLIDE 52

9

Auth256: bit, 2256. but PRP. needed C, safely use mode?

10

Bit operations per bit of plaintext (assuming precomputed subkeys): key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 62 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES More virtues of mult-based MA

  • Easy masking.
  • Binary mults: Share area with

code-based crypto.

  • Integer mults: Share area

lattice-based crypto and ECC.

  • Use existing CPU multipliers.
slide-53
SLIDE 53

10

Bit operations per bit of plaintext (assuming precomputed subkeys): key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 62 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

11

More virtues of mult-based MACs:

  • Easy masking.
  • Binary mults: Share area with

code-based crypto.

  • Integer mults: Share area with

lattice-based crypto and ECC.

  • Use existing CPU multipliers.
slide-54
SLIDE 54

10

Bit operations per bit of plaintext (assuming precomputed subkeys): key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 62 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

11

More virtues of mult-based MACs:

  • Easy masking.
  • Binary mults: Share area with

code-based crypto.

  • Integer mults: Share area with

lattice-based crypto and ECC.

  • Use existing CPU multipliers.

If int mults are available anyway, should we renew attention to ciphers that use some mults?

slide-55
SLIDE 55

10

Bit operations per bit of plaintext (assuming precomputed subkeys): key ops/bit cipher 256 54 ChaCha8 256 78 ChaCha12 128 88 Simon: 62 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

11

More virtues of mult-based MACs:

  • Easy masking.
  • Binary mults: Share area with

code-based crypto.

  • Integer mults: Share area with

lattice-based crypto and ECC.

  • Use existing CPU multipliers.

If int mults are available anyway, should we renew attention to ciphers that use some mults? e.g. x *= 0xdf26f9 is same as x-=x<<3; x-=x<<8; x+=x<<13. Mix with ^, >>>16, maybe +. Try 16-bit mults for Intel, ARM.