Matthieu Bloch December 2, 2019
INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY
Lecture 1 - Elements of Information Theory
1
INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY - - PowerPoint PPT Presentation
INFORMATION-THEORETIC SECURITY INFORMATION-THEORETIC SECURITY Lecture 1 - Elements of Information Theory Matthieu Bloch December 2, 2019 1 CONTEXT AND OBJECTIVES CONTEXT AND OBJECTIVES Context Security is an increasingly problematic
Matthieu Bloch December 2, 2019
Lecture 1 - Elements of Information Theory
1
Context Security is an increasingly problematic requirement Crypto works… most of the time Lightweight solution desirable Information-theoretic security may help IT security vs. cryptography IT security makes no assumption on computation power but requires noisy observation structure Cryptography makes no assumption on observation structure but restricts computational power Opposite philosophies are not necessarily incompatible Objectives of course Demystify canonical results in information-theoretic security Provide tools to read papers easily and maybe write papers Prove stuff!
3
4
Set of signal processing and coding mechanisms that exploit asymmetries in interaction and perception to make an attackers’s job harder Asymmetries in interaction and perception Key distinguishing factors with more traditional approaches Assumption is both a strength and a weakness Strength: potential gains in “efficiency” Weakness: model easily questioned Connected research areas Signal processing, coding theory, communication theory, information theory Computer science, control theory, differential privacy, machine learning Most of the emphasis in this course will be on information theory and coding theory
5
http://wwww.phdcomics.com
Lectures 1-2: Elements of information theory Channel reliability and source coding with side information Channel output approximation and randomness extraction Lectures 3-4: Information-theoretic secrecy Secure communication over the wiretap channel Secret key generation from correlated source Lecture 5: Information-theoretic covertness Undetectable communications Lecture 6: Information-theoretic authentication Lecture 7: Information-theoretic privacy Lecture 8-9: Coding for secrecy Polar codes Universal hash functions Lecture 10: Uncertain and adversarial models
7
Objective: calibrate notation and concepts that we will build on extensively Useful tools Jensen’s inequality, distances between distributions, concentration inequalities Information theoretic metrics Entropy and mutual information Canonical information theoretic results Channel coding Source coding with side information Extensive and detailed lecture notes will be provided
9
Random variable , realization , alphabet , probability mass function We will deal with continuous random variables separately Oen sums can be replaced by integrals and PMFs by PDFs without too much thinking
is a Markov Kernel from to if for all and for all . For any define Do not overthink this notation, it is sometimes helpful to manipulate marginals of joint distributions more easily
Let be real-valued random variables with joint . Then , , form a Markov chain in that order, denoted , if and are conditionally independent given , i.e., we have .
X x X ∈ Δ(X) pX W X Y x, y ∈ X × Y W(y|x) ≥ 0 x ∈ X W(y|x) = 1 ∑y∈Y p ∈ Δ(X) W ⋅ p ∈ Δ(X × Y) (W ⋅ p) (x, y) ≜ W(y|x)p(x) W ∘ p ∈ Δ(Y) (W ∘ p) (y) ≜ (W ⋅ p)(x, y) ∑
x∈X
X, Y , Z pXYZ X Y Z X − Y − Z X Z Y ∀(x, y, z, ) ∈ X × Y × Z (x, y, z) = (z|y) (y|x) (x) pXZY pZ|Y pY|X pX
11
12
13
14
A function is convex if . A function is strictly convex if the inequality above is strict. A function is (strictly) concave if is (strictly) convex. Theorem (Jensen's inequality) Let be a real-valued random variable defined on some interval and with . Let be a real valued function that is convex in . Then, For any strictly convex function, equality holds if and only if is a constant. Jensen’s inequality also holds more generally for continuous random variables. Proposition (Log-sum inequality) Let and . Then,
f : [a; b] ⟼ R ∀λ ∈ [0; 1] f(λa + (1 − λ)b) ≤ λf(a) + (1 − λ)f(b) f f −f X [a, b] pX f : [a, b] → R [a, b] f( [X]) ≤ [f(X)]. E E X { ∈ ai}n
i=1
Rn
+
{ ∈ bi}n
i=1
Rn
+
log ≥ ( ) log . ∑
i=1 n
ai ai bi ∑
i=1 n
ai ( ) ∑n
i=1 ai
( ) ∑n
i=1 bi
15
16
Many IT-security metrics express how “close” or “distinct” probability distributions are
is a legitimate distance on (symmetry, positivity, triangle, inequality) Proposition (Alternative expression for total variation) Consequently, . Proposition (Properties of total variation) For and a kernel from to , we have
For p, q ∈ Δ(X) V (p, q) ≜ ≜ |p(x) − q(x)| 1 2 ∥p − q∥1 1 2 ∑
x
V (⋅, ⋅) Δ(X) ∀p, q ∈ Δ(X) V (p, q) = ( (E) − (E)) = ( (E) − (E)) . sup
E⊂X
Pp Pq sup
E⊂X
Pq Pp 0 ≤ V (p, q) ≤ 1 p, q ∈ Δ(X) W X Y V (W ⋅ p, W ⋅ q) = V (p, q) and V (W ∘ p, W ∘ q) ≤ V (p, q) .
17
18
with the convention that if is not absolutely continuous , i.e., is not a subset of . Relative entropy is not symmetric Proposition (Positivity of entropy) For any with equality if and only if . Proposition (Pinsker's inequality) For any . Proposition (Reverse Pinsker's inequality) For any with , we have where . One can go back and forth between and but the metrics are not equivalent
p, q ∈ Δ(X) D(p∥q) ≜ p(x) log ∑
x
p(x) q(x) D(p∥q) = ∞ p q supp (p) supp (q) p, q ∈ Δ(X) D(p∥q) ≥ 0 p = q p, q ∈ Δ(X) V (p, q) ≤ D(p∥q) − − − − − − √ p, q ∈ Δ(X) supp (q) = X D(p∥q) ≤ V (p, q) log
1 qmin
≜ q(x) qmin minx V (⋅, ⋅) D(⋅∥⋅)
19
20
21
22
For two jointly distributed random variables
We also define Proposition (Positivity) Let be a discrete random variable. Then with equality iff cst. Let be correlated discrete random variables with joint . Then with equality if and only if is a function of . Proposition (Chain rule) Let be two joint discrete random variable, then
X, Y Δ(X × Y) H(X) ≜ [− log (X)] H(X|Y ) ≜ [− log (X|Y )] EX pX EXY pX|Y H(XY ) = [− log (XY )] = H(X) + H(Y |X) EXY pXY X ∈ X H(X) ≥ 0 X = X, Y pXY H(Y |X) ≥ 0 Y X X, Y H(XY ) = H(X) + H(Y |X) = H(Y ) + H(X|Y ).
23
24
25
26
Proposition (Fano's inequality) Let be a discrete random variable with alphabet . Let be an estimate of , and with joint distribution . We define the probability of estimation error: . Then, . Lemma (Csiszar's inequality)
X X X ^ X ∈ X X ^ pXX
^
≜ P[X ≠ ] Pe X ^ H(X| ) ≤ ( ) + log(|X| − 1) X ^ Hb Pe Pe p, q ∈ Δ(X) |H(p) − H(q)| ≤ V (p, q) log |X| V (p, q)
27
28
Proposition (Monotonicity of entropy) Let and be discrete random variables with joint probability distribution function . Then , i.e. "conditioning reduces entropy.’’ Proposition (Chain rule)
I(X; Y ) I(X; Y |Z) = D( ∥ ) = H(Y ) − H(Y |X) pXY pXpY = (x)D( ∥ ) = H(Y |Z) − H(Y |XZ) ∑
z
PZ pXY|Z=z pX|Z=zpY|Z=z X Y pXY H(X|Y ) ≤ H(X) I( ; Y ) = I( ; Y ) + I( ; Y | ) X1X2 X1 X2 X1
29
30
31