CSC2412: Definition of Differential Privacy
Sasho Nikolov 1CSC2412: Definition of Di ff erential Privacy Sasho Nikolov 1 An - - PowerPoint PPT Presentation
CSC2412: Definition of Di ff erential Privacy Sasho Nikolov 1 An - - PowerPoint PPT Presentation
CSC2412: Definition of Di ff erential Privacy Sasho Nikolov 1 An Ideal Goal The study reveals nothing new about any particular individual to an adversary. - not much Example: Adversary believes humans have four fingers on each hand. In
An Ideal Goal
The study reveals nothing new about any particular individual to an adversary. Example:- Adversary believes humans have four fingers on each hand.
- In particular, believes Sasho has four fingers on each hand.
- not much
An Ideal Goal
The study reveals nothing new about any particular individual to an adversary. Example:- Adversary believes humans have four fingers on each hand.
- In particular, believes Sasho has four fingers on each hand.
- Study reveals distribution of number of fingers per person’s hand.
- Adversary now has learned Sasho probably has five fingers per hand.
An Ideal Goal
The study reveals nothing new about any particular individual to an adversary. Example:- Adversary believes humans have four fingers on each hand.
- In particular, believes Sasho has four fingers on each hand.
- Study reveals distribution of number of fingers per person’s hand.
- Adversary now has learned Sasho probably has five fingers per hand.
- Adversary believes there is no link between smoking and cancer.
- Also knows that Sasho smokes
- Study reveals link between smoking and cancer.
Learning
about the
world also
/
::3earning
Statistical vs Personal Information
In the examples, the adversary learns statistical information that pertains to Sasho.- If science works, it better reveal something about me.
- four
smoking a
cancer}
Yesstatistical
finding if ?I÷f }
nopersonal
Towards a Definition
The algorithm doing the analysis should do almost the same in all the following cases:- my data is included in the data set
- my data is not included in the data set
- my data is changed in the data set
- I. e. , what
the algorithm publishes
does not
depend
too strongly- n
my
data
.Data Model
Data set: (multi-)set X of n data points X = {x1, . . . , xn}.- each data point (or row) xi is the data of one person
- each data point comes from a universe X
{ ditty?.at
.es- Eg
binary attributes
¥-1
Xi c- I = 40,1yd- The output
- f
MIX)
israndom
for
any
X
Almost a Definition
We call two data sets X and X 0 neighbouring if- 1. (variable n) we can get X 0 from X by adding or removing an element
- 2. (fixed n) we can get X 0 from X by replacing an element with another
↳
differ
in the data of asingle individual
X -ft , . . - , Xu } X '- ft
- i
% dataset size
xn ) X- f r .
i'it '
"- Hu}
% MCH
andMIX
' ) are " similar " as random variablesTotal Variation Distance Differential Privacy
Definition An mechanism M is δ-tv differentially private if, for any two neighbouring datasets X, X 0, and any set of outputs S |P(M(X) ∈ S) − P(M(X 0) ∈ S)| ≤ δ. What should δ be?- δ <
- δ ≥
dtullllt) , UH
') )- msatl PINCHES)
- PINK
X
- h
quot
nee . neighbouring X '- ftii .
- in
}
For anyX
, X ' , there are k ⇐ n datasetshesufjchan.su
- does
- ' '
Lsat HPCUCH
c- S )- ipceeixyes , I earn ez}
a1 datasets
- "Name
ft - X
' X-- Ex .
ly
,prof t - S ri is not published, and NIH- UCH)
neighbouring
' -ft. . - ,# . . . . , tf not intuitively private : some data pt . published wreoustprof .Finally, Differential Privacy
Definition An mechanism M is ε-differentially private if, for any two neighbouring datasets X, X 0, and any set of outputs S P(M(X) ∈ S) ≤ eεP(M(X 0) ∈ S). 8 In Vadhana 's notes : any conclusion an adversary draws from UCH could we Dwork ,McSherry , Nissim ,Smith 2006 been drawn from Uct ')↳
n
HE for small E E small positive constant " Name and shame "P ( ell X't c- S ) c.ee/PCdlHc- S )
- fails
- lefin
- event that
somethingbad
happens to mefor
any
2<0My risks if
my
data are used almost same asif they
are not usedA Hypothesis Testing Viewpoint
Suppose X = {X1, . . . , Xn} are drawn IID from some distribution. The adversary A wants to use M(X) to test which hypothesis holds: H0: Xi = y0- E.g., “Sasho does not smoke”
- E.g., “Sasho smokes”
④
not essentialWasserman
, ZhouT se - DP
( that sees Nlt) and- utputs
- y ,
- yo
- -
Type I
errorRandomized Response
Given- dataset X = {x1, . . . , xn} ⊆ X,
- query q : X → {0, 1}
- utput M(X) = (Y1(x1), . . . , Yn(xn)), where, independently
if
x is a smoker E.g .VH
- { o
- w
Privacy Analysis
ETS for any y ∈ {0, 1}n, and any neighbouring X, X 0 P(M(X) = y) ≤ eεP(M(X 0) = y). 11 #- V-seso.it
PINCHES)
=PINCH
- y)
¥
Plait
' )- y )
- e' Mutt
fxiiriityi
take
somexx
'neighbouring
!
. . .i
tty
p (Nlt)
- y )
- ya
- fu )
- y, )
- yr )
- -
- yn)
- y) = Ply. ix. Ii y, )
Pl Li Ki
's -- yet
- PIL Kul
- ya)
Accuracy Analysis
Want to approximate q(X) = 1 n Pn i=1 q(xi). Claim: 1 n Pn i=1 (1+eε)Yi1 eε1 ≈ q(X) 12q :D→{ 0,13
Efi)
- quit
Etsi)
- quite
ELIE
, Zi)- I
- qlt)
Zi
Hoeffdings Inequality
: 2£ ,uinofependentfpH-zi-EIE.tt/zt)
- a. 2.
e.at#uetti:ZieI-eIIeeIT
)
if
n > enter)# D 'BIKE.to
- qlhlzalhexpf.IE?iiiE'Yer
if
ulogin
22 EZ