h s i l a c o L Distributed Models for Statistical Data - - PowerPoint PPT Presentation

h s i l a c o l
SMART_READER_LITE
LIVE PREVIEW

h s i l a c o L Distributed Models for Statistical Data - - PowerPoint PPT Presentation

h s i l a c o L Distributed Models for Statistical Data Privacy Adam Smith Based on L. Reyzin, A. Smith, S. Yakoubov BU Computer Science https://eprint.iacr.org/2018/997 PPML 2018 Workshop A. Cheu, A. Smith, J. Ullman, D.


slide-1
SLIDE 1

Distributed Models for Statistical Data Privacy

Adam Smith

BU Computer Science PPML 2018 Workshop December 8, 2018

Based on

  • L. Reyzin, A. Smith, S. Yakoubov

https://eprint.iacr.org/2018/997

  • A. Cheu, A. Smith, J. Ullman, D. Zeber, M.

Zhilayev https://arxiv.org/abs/1808.01394

L

  • c

a l i s h

slide-2
SLIDE 2

Privacy in Statistical Databases

Many domains

  • Census
  • Medical
  • Advertising
  • Education

3

Individuals

queries answers

Researchers

“Agency”

Summaries Complex models Synthetic data

slide-3
SLIDE 3

Privacy in Statistical Databases

“Aggregate” outputs can leak lots

  • f information
  • Reconstruction attacks
  • Example: Ian Goldberg’s talk on

“the secret sharer”

4

Individuals

queries answers

Researchers

“Agency”

Summaries Complex models Synthetic data

slide-4
SLIDE 4

5

Utility Privacy Utility Privacy Trust model

slide-5
SLIDE 5

6

local random coins

A

A(x’)

!’ is a neighbor of ! if they differ in one data point

local random coins

A

A(x)

Definition: A is #, % -differentially private if, for all neighbors !, !’, for all sets of outputs & Pr

)*+,- *. / 0 ! ∈ & ≤ 34 ⋅

Pr

)*+,- *. / 0 !6 ∈ & + %

Neighboring databases induce close distributions

  • n outputs

Differential Privacy [Dwork, McSherry, Nissim, S. 2006]

slide-6
SLIDE 6

Outline

  • Local model
  • Models for DP + MPC
  • Lightweight architectures

Ø“From HATE to LOVE MPC”

  • Minimal primitives

Ø“Differential Privacy via Shuffling”

7

slide-7
SLIDE 7

Local Model for Privacy

  • “Local” model

Ø Person ! randomizes their own data Ø Attacker sees everything except player !’s local state

  • Definition: A is "-locally differentially private if for all !:

Ø for all neighbors #, #’, Ø for all behavior % of other parties, Ø for all sets of transcripts &: Pr

)*+,- ./ 0 #, % = 3 ≤ 56 ⋅

Pr

)*+,- ./ 0 #8, % = 3

8

local random coins

A

Untrusted aggregator

A

9: 9; 9< = = 0 w.l.o.g.

Equivalent to [Efvimievski, Gehrke, Srikant ‘03]

slide-8
SLIDE 8

Local Model for Privacy

9

local random coins

A

Untrusted aggregator

A

!" !# !$

https://developer.apple.com/ videos/play/wwdc2016/709/ https://github.com/google/rappor

slide-9
SLIDE 9

Local Model for Privacy

  • Pros

Ø No trusted curator Ø No single point of failure Ø Highly distributed Ø Beautiful algorithms

  • Cons

Ø Low accuracy

  • Proportions: Θ

" # $ error [BMO’08,CSS’12] vs %( " $#) central

Ø Correctness requires honesty

10

local random coins

A

Untrusted aggregator

A

(" () ($

slide-10
SLIDE 10

Selection Lower Bounds [DJW’13, Ullman ‘17]

  • Suppose each person has ! binary attributes
  • Goal: Find index " with highest count (±%)
  • Central model: ' = ) log(!)/.% suffices

[McSherry Talwar ‘07]

  • Local model: Any noninteractive local DP protocol

with nontrivial error requires ' = Ω(! log(!) /.0)

Ø [DJW’13, Ullman ‘17] Ø (No lower bound known for interactive protocols)

12

1 attributes 2

people

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

data

slide-11
SLIDE 11

Local Model for Privacy

What other models allow similarly distributed trust?

13

local random coins

A

Untrusted aggregator

A

!" !# !$

slide-12
SLIDE 12

Outline

  • Local model
  • Models for DP + MPC
  • Lightweight architectures

Ø“From HATE to LOVE MPC”

  • Minimal primitives

Ø“Differential Privacy via Shuffling”

14

slide-13
SLIDE 13

Two great tastes that go great together

  • How can we get accuracy without a trusted curator?
  • Idea: Replace central algorithm ! with multiparty computation

(MPC) protocol for ! (randomized), and either

Ø Secure channels + honest majority Ø Computational assumptions + PKI

  • Questions:

Ø What definition does this achieve? Ø Are there special-purpose protocols that are more efficient than generic reductions? Ø What models make sense? Ø What primitives are needed?

15

A "#

slide-14
SLIDE 14

Definitions

What definitions are achieved?

  • Simulation of an (", $)-DP protocol
  • Computational DP [Mironov, Pandey, Reingold, Vadhan’08]

16

A &'

Definition: A is ((, ", $)-computationally differentially private if, for all neighbors ), )’, for all distinguishers + ∈ (-./(() Pr

23456 37 ' + 8 )

= 1 ≤ /< ⋅ Pr

23456 37 ' + 8 )>

= 1 + $ Not equivalent

slide-15
SLIDE 15

Question 1: Special-purpose protocols

  • [Dwork Kenthapadi McSherry Mironov Naor ‘06]

Special-purpose protocols for generating Laplace/exponential noise via finite field arithmetic

Ø⇒ honest-majority MPC ØSatisfies simulation, follows existing MPC models ØLots of follow-up work

  • [He, Machanavajjhala, Flynn, Srivastava ’17,

Mazloom, Gordon ’17, maybe others?]

Use DP statistics to speed up MPC

ØLeaks more than ideal functionality

17

slide-16
SLIDE 16

Question 2: What MPC models make sense?

  • Recall: secure MPC protocols require

ØCommunication between all pairs of parties ØMultiple rounds, so parties have to stay online

  • Protocols involving all

Google/Apple users wouldn’t work

18

slide-17
SLIDE 17

Question 2: What MPC models make sense?

Applications of DP suggest a few different settings

  • “Few hospitals”

Ø Small set of computationally powerful data holders Ø Each holds many participants’ data Ø Data holders have their own privacy-related concerns

  • Sometimes can be modeled explicitly, e.g. [Haney,

Machanavajjhala, Abowd, Graham, Kutzbach, Vilhuber ‘17]

  • Data holders interests may not align with individuals’
  • “Many phones”

Ø Many weak clients (individual data holders) Ø One server or small set of servers Ø Unreliable, client-server network Ø Calls for lightweight MPC protocols, e.g.

[Shi, Chan, Rieffel, Chow, Song ‘11, Boneh, Corrigan-Gibbs ‘17, Bonawitz, Ivanov, Kreuter, Marcedone, McMahan, Patel, Ramage, Segal, Seth ’17]

DP does not need full MPC

Ø Sometimes, leakage helps [HMFS ’17, MG’17] Ø Sometimes, we do not know how to take advantage of it [McGregor Mironov Pitassi Reingold Talwar Vadhan ’10]

19

! = $(&

', … , &*)

! = $(&1, … , &*)

slide-18
SLIDE 18

Question 3: What MPC primitives do we need?

  • Observation: Most DP algorithms rely on 2 primitives

Ø Addition + Laplace/Gaussian noise Ø Threshold(summation + noise)

  • Sufficient for “sparse vector” and “exponential mechanism”
  • [Shafi’s talk mentions others for training nonprivate deep nets.]

Ø Relevant for PATE framework

  • Lots of work focuses on addition

Ø “Federated learning” Ø Relies on users to introduce small amounts of noise

  • Thresholding remains complicated

Ø Because highly nonlinear Ø Though maybe approximate thresholding easier (e.g. HEEAN)

  • Recent papers look at weaker primitives

Ø Shufflers as a useful primitive [Erlingsson, Feldman, Mironov, Raghunathan, Talwar, Thakurta] [Cheu, Smith, Ullman, Zeber, Zhilyaev 2018]

20

slide-19
SLIDE 19

Outline

  • Local model
  • Models for DP + MPC
  • Lightweight architectures

Ø“From HATE to LOVE MPC”

  • Minimal primitives

Ø“Differential Privacy via Shuffling”

21

slide-20
SLIDE 20

Turning HATE into LOVE MPC

Scalable Multi-Party Computation With Limited Connectivity

Leonid Reyzin, Adam Smith, Sophia Yakoubov

https://eprint.iacr.org/2018/997

slide-21
SLIDE 21

Goals

  • Clean formalism for “many phones” model
  • Inspired by protocols of [Shi et al, 2011; Bonawitz et al. 2017]
  • Identify
  • Fundamental limits
  • Potentially practical protocols
  • Open questions
slide-22
SLIDE 22

Large-scale One-server Vanishing-participants Efficient MP MPC

[Goldreich,Micali,Widgerson87,Yao87] X2 X3 X1 X4 Y = f(X1, X2, X3, X4) Y = f(X1, X2, X3, X4) Y = f(X1, X2, X3, X4) Y = f(X1, X2, X3, X4) No party learns anything other than the output!

slide-23
SLIDE 23

Large-scale One-server Vanishing-participants Efficient MP MPC

X2 X3 X1 X4 Y = A(X1, X2, X3, X4) Y = A(X1, X2, X3, X4) Y = A(X1, X2, X3, X4) Y = A(X1, X2, X3, X4) Can compute differentially private statistic A(X) without server learning anything but the output!

[Dwork,Kenthapadi,McSherry,Mironov,Naor06]

X4= Central model level accuracy! Local model level privacy!

slide-24
SLIDE 24

Large-scale One-server Vanishing-participants Efficient MP MPC

X2 X3 X1 X4 Y = A(X1, X2, X3, X4) Y = A(X1, X2, X3, X4) Y = A(X1, X2, X3, X4) Y = A(X1, X2, X3, X4) Can compute differentially private statistic A(X) without server learning anything but the output! A(X) is often linear, so we will focus on MPC for addition X4= Central model level accuracy! Local model level privacy!

slide-25
SLIDE 25

Large-scale One-server Vanishing-participants Efficient MP MPC

X2 X3 Y = f(X1, X2, X3, X4) X1 Clients Server Computational power weak strong X4

slide-26
SLIDE 26

Large-scale One-server Vanishing-participants Efficient MP MPC

X2 X3 X1 Clients Server Computational power weak strong Y = f(X1, X2, X3, X4) X4

slide-27
SLIDE 27

Large-scale (millions of clients) One-server Vanishing-participants Efficient MP MPC

Clients Server Computational power weak strong Y = f(X1, X2, … Xn)

slide-28
SLIDE 28

Large-scale (millions of clients) One-server Vanishing-participants Efficient MP MPC

Y = f(X1, X2, … Xn) Clients Server Computational power weak strong Direct communication

  • nly to server

to everyone

  • Star communication graph,

as in noninteractive multiparty computation (NIMPC)

[Beimel,Gabizon,Ishai,Kushilevitz,Meldgaard,PaskinCherniavsky14]

slide-29
SLIDE 29

Large-scale (millions of clients) One-server Vanishing-participants Efficient MP MPC

Y = f(X1, X2, … Xn) Clients Server Computational power weak strong Direct communication

  • nly to server

to everyone Network unreliable reliable

  • Computation must complete

even if some clients abort

slide-30
SLIDE 30

Large-scale (millions of clients) One-server Vanishing-participants Efficient MP MPC

Y = f(X1, X2, … Xn)

  • Computation must complete

even if some clients abort

  • Considered in many papers

in the all-to-all communication graph [Badrinarayanan,Jain,Manohar,Sahai18]

  • Considered in [Bonawitz,Ivanov,Kreuter,Mercedone,McMahan,Patel,Ramage,Segal,Seth17]

in star communication graph, achieved in 5 message flows

What’s the best we can do?

slide-31
SLIDE 31

Our Contributions

  • Defining LOVE MPC
  • Minimal requirements for LOVE MPC:
  • 3 flows
  • Setup: correlated randomness of PKI
  • Building LOVE MPC for addition
  • Main Tool: Homomorphic Ad hoc Threshold Encryption
  • Tradeoffs in LOVE MPC
slide-32
SLIDE 32

Our Contributions

  • Defining LOVE MPC
  • Minimal requirements for LOVE MPC:
  • 3 flows
  • Setup: correlated randomness or PKI
  • Building LOVE MPC for addition
  • Main Tool: Homomorphic Ad hoc Threshold Encryption
  • Tradeoffs in LOVE MPC
slide-33
SLIDE 33

Our Contributions

  • Defining LOVE MPC
  • Minimal requirements for LOVE MPC:
  • 3 flows
  • Some setup: PKI
  • Building LOVE MPC for addition
  • Main Tool: Homomorphic Ad hoc Threshold Encryption
  • Definitions
  • Construction: Share-And-Encrypt
  • Putting it all together
  • Tradeoffs in LOVE MPC
slide-34
SLIDE 34

LOVE MPC for Addition

PK Size Communication Per Party Message Space Size Assumption Family [BIKMMPRSS17] O(1) O(n) any

slide-35
SLIDE 35

LOVE MPC for Addition

PK Size Communication Per Party Message Space Size Assumption Family [BIKMMPRSS17] O(1) O(n) any Fully Homomorphic ATE

[Badrinarayanan, Jain, Manohar, Sahai 2018]

O(1) poly(n) any lattices Shamir-and- ElGamal O(1) O(n) small DDH CRT-and-Paillier O(1) O(n) any factoring Obfuscation poly(n) O(1) small iO LOVE MPC from HATE OUR WORK

slide-36
SLIDE 36

LOVE MPC for Addition

PK Size Communication Per Party Message Space Size Assumption Family Number of Rounds [BIKMMPRSS17] O(1) O(n) any 5 Fully Homomorphic ATE

[Badrinarayanan, Jain, Manohar, Sahai 2018]

O(1) poly(n) any lattices 3 Shamir-and- ElGamal O(1) O(n) small DDH 3 CRT-and-Paillier O(1) O(n) any factoring 3 Obfuscation poly(n) O(1) small iO 3 LOVE MPC from HATE OUR WORK

slide-37
SLIDE 37

LOVE MPC for Addition

PK Size Communication Per Party Message Space Size Assumption Family Number of Rounds 1st nth 1st nth [BIKMMPRSS17] O(1) O(n) O(n) any 5 5 Fully Homomorphic ATE

[Badrinarayanan, Jain, Manohar, Sahai 2018]

O(1) poly(n) poly(n) any lattices 3 3 Shamir-and- ElGamal O(1) O(n) O(n) small DDH 3 3 CRT-and-Paillier O(1) O(n) O(n) any factoring 3 3 Obfuscation poly(n) O(1) O(1) small iO 3 3 LOVE MPC from HATE OUR WORK

slide-38
SLIDE 38

LOVE MPC for Addition

PK Size Communication Per Party Message Space Size Assumption Family Number of Rounds 1st nth 1st nth [BIKMMPRSS17] O(1) O(n) O(n) any 5 5 Fully Homomorphic ATE

[Badrinarayanan, Jain, Manohar, Sahai 2018]

O(1) poly(n) poly(n) any lattices 3 3 Shamir-and- ElGamal O(1) O(n) O(n) small DDH 3 3 CRT-and-Paillier O(1) O(n) O(n) any factoring 3 3 Obfuscation poly(n) O(1) O(1) small iO 3 3 Threshold ElGamal O(1) O(n) O(1) small DDH 5 3 LOVE MPC from HATE OUR WORK

slide-39
SLIDE 39

Open Questions

PK Size Communication Per Party Message Space Size Assumption Family Number of Rounds 1st nth 1st nth [BIKMMPRSS17] O(1) O(n) O(n) any 5 5 Fully Homomorphic ATE

[Badrinarayanan, Jain, Manohar, Sahai 2018]

O(1) poly(n) poly(n) any lattices 3 3 Shamir-and- ElGamal O(1) O(n) O(n) small DDH 3 3 CRT-and-Paillier O(1) O(n) O(n) any factoring 3 3 Obfuscation poly(n) O(1) O(1) small iO 3 3 Threshold ElGamal O(1) O(n) O(1) small DDH 5 3 ? O(1) O(1) O(1) 3 3 LOVE MPC from HATE OUR WORK

slide-40
SLIDE 40

Our Contributions

  • Defining LOVE MPC
  • Minimal requirements for LOVE MPC:
  • 3 flows
  • Some setup: PKI
  • Building LOVE MPC for addition
  • Main Tool: Homomorphic Ad hoc Threshold Encryption
  • Definitions
  • Construction: Share-And-Encrypt
  • Putting it all together
  • Tradeoffs in LOVE MPC
slide-41
SLIDE 41

Homomorphic Ad Hoc Threshold Encryption

SK C Decrypt Plaintext X PK Encrypt Ciphertext C X

slide-42
SLIDE 42

Partial Decryption d Partial Decryption d Partial Decryption d

Homomorphic Ad Hoc Threshold Encryption

C PartDecrypt PartDecrypt PartDecrypt C C FinalDecrypt Plaintext X PK Encrypt Ciphertext C X

slide-43
SLIDE 43

Partial Decryption d Partial Decryption d Partial Decryption d

Homomorphic Ad Hoc Threshold Encryption

C PartDecrypt PartDecrypt PartDecrypt C C FinalDecrypt Plaintext X PK Encrypt Ciphertext C X Can recover X iff t of n parties provide d

slide-44
SLIDE 44

Partial Decryption d

Homomorphic Ad Hoc Threshold Encryption

C PartDecrypt PartDecrypt PartDecrypt C C Partial Decryption d Partial Decryption d FinalDecrypt Plaintext X Can recover X iff t of n parties provide d PK PK PK Encrypt Ciphertext C X

slide-45
SLIDE 45

Homomorphic Ad Hoc Threshold Encryption

Encrypt f C3 C1 C2 C X3 X2 X1 SK PartDecrypt PartDecrypt PartDecrypt FinalDecrypt X = f(X1,X2,X3)

slide-46
SLIDE 46

Additive HATE

PK Size Ciphertext Size Message Space Size Assumption Family Fully Homomorphic ATE

[Badrinarayanan, Jain, Manohar, Sahai 2018]

O(1) poly(n) any lattices Shamir-and- ElGamal O(1) O(n) small DDH CRT-and-Paillier O(1) O(n) any factoring Obfuscation poly(n) O(1) small iO OUR WORK

slide-47
SLIDE 47

Outline

  • Local model
  • Models for DP + MPC
  • Lightweight architectures

Ø“From HATE to LOVE MPC”

  • Minimal primitives

Ø“Differential Privacy via Shuffling”

22

slide-48
SLIDE 48

This talk

Like any long, beautiful relationship, it requires work Your homework:

  • Better protocols
  • Minimal primitives
  • Hybrid models

(see A. Korolova’s talk, I. Goodfellow’s)

ØNonprivate ØCentral-model DP ØLocal-model DP

  • Think of other models

23

Crypto

Private data analysis