[PPT] - Fast Polarization for Processes with Memory Joint work with Eren S PowerPoint Presentation

SLIDE 1

Fast Polarization for Processes with Memory

Joint work with Eren S ¸as ¸o˘ glu and Boaz Shuval

1/32

SLIDE 2

Polar codes in one slide

W XN

1

YN

1

Polar coding

◮ Information vector: ˜

Uk

1 ◮ Padding: UN 1 = f(˜

Uk

1) ◮ Encoding: XN 1 = UN 1 · G−1 N ◮ Decoding: Successively, deduce Ui from Ui−1 1

and YN

1

2/32

SLIDE 3

Polar codes in two slides: [Arıkan:09], [ArıkanTelatar:09]

◮ Setting: binary-input, symmetric, memoryless channel ◮ Polar transform: UN 1 = XN 1 · GN

XN

1

uniform ⇐ ⇒ UN

1

uniform

◮ Low entropy indices: Fix β < 1/2

ΛN =

i : Perror(Ui|Ui−1

1

, YN

1 ) < 2−Nβ ◮ Polarization: Let XN 1 be uniform

lim

N→∞

1 N |ΛN| = I(X1; Y1)

◮ Coding scheme:

◮ For i ∈ ΛN, set Ui equal to information bits (uniform) ◮ Set remaining Ui to uniform values, reveal to decoder ◮ Transmit XN

1 = UN 1 · G−1 N

as codeword

3/32

SLIDE 4

In this talk

Setting: binary-input, symmetric, memoryless channel

4/32

SLIDE 5

In this talk

Setting: binary-input, symmetric, memoryless channel

4/32

SLIDE 6

In this talk

Setting: binary-input, symmetric, memoryless channel

4/32

SLIDE 7

In this talk

Setting: binary-input, symmetric, memoryless channel

4/32

SLIDE 8

In this talk

Setting: binary-input, symmetric, ✭✭✭✭✭✭

✭ ❤❤❤❤❤❤ ❤

memoryless channel

4/32

SLIDE 9

Polar codes: [S

¸as ¸o˘ glu+:09], [KoradaUrbanke:10], [HondaYamamoto:13]

◮ Setting: Memoryless i.i.d. process (Xi, Yi)N i=1 ◮ For simplicity: Assume Xi binary ◮ Polar transform: UN 1 = XN 1 · GN ◮ Index sets:

Low entropy: ΛN =

i : Perror(Ui|Ui−1

1

, YN

1 ) < 2−Nβ

High entropy: ΩN =

i : Perror(Ui|Ui−1

1

, YN

1 ) > 1

2 − 2−Nβ

◮ Polarization:

lim

N→∞

1 N |ΛN| = 1 − H(X1|Y1) lim

N→∞

1 N |ΩN| = H(X1|Y1)

5/32

SLIDE 10

Polar codes: [S

¸as ¸o˘ glu+:09], [KoradaUrbanke:10], [HondaYamamoto:13]

Optimal rate for:

◮ Coding for non-symmetric memoryless channels ◮ Coding for memoryless channels with non-binary inputs ◮ (Lossy) compression of memoryless sources

Question

◮ How to handle memory?

6/32

SLIDE 11

Roadmap

Index sets

Low entropy: ΛN(ǫ) =

i : Perror(Ui|Ui−1

1

, YN

1 ) < ǫ

High entropy:

ΩN(ǫ) =

i : Perror(Ui|Ui−1

1

, YN

1 ) > 1

2 − ǫ

Plan

◮ Define framework for handling memory ◮ Establish:

◮ Slow polarization: for ǫ > 0 fixed,

lim

N→∞

1 N |ΛN(ǫ)| = 1 − H⋆(X|Y) lim

N→∞

1 N |ΩN(ǫ)| = H⋆(X|Y)

◮ Fast polarization: also holds for ǫ = 2−Nβ ◮ What is β?

limN→∞ 1

NH(XN 1|YN 1 )

7/32

SLIDE 12

A framework for memory

◮ Process:

(Xi, Yi, Si)N

i=1

◮ Finite number of states: Si ∈ S, where |S| < ∞ ◮ Hidden state: Si is unknown to encoder and decoder

◮ Probability distribution:

P(xi, yi, si|si−1)

◮ Stationary: same for all i ◮ Markov:

P(xi, yi, si|si−1) = P(xi, yi, si|{xj, yj, sj}j<i)

◮ State sequence: aperiodic and irreducible Markov chain 8/32

SLIDE 13

Example 1

◮ Model: Finite state channel

Ps(y|x) , s ∈ S

◮ Input distribution: Xi i.i.d. and independent of state ◮ State transition:

π(si|si−1)

◮ Distribution:

P(xi, yi, si|si−1) = P(xi)π(si|si−1)Psi(yi|xi)

9/32

SLIDE 14

Example 2

◮ Model: ISI + noise

Yi = h0Xi + h1Xi−1 + · · · + hmXi−m + noise

◮ Input: Xi has memory

P(xi|xi−1, xi−2, . . . , xi−m, xi−m−1)

◮ State:

Si =

Xi

Xi−1 · · · Xi−m

◮ Distribution: For xi, si, si−1 compatible,

P(xi, yi, si|si−1) = Pnoise(yi|hTsi) · P(xi|si−1)

10/32

SLIDE 15

Example 3

◮ Model: (d, k)-RLL constrained system with noise

1

(1, ∞)-RLL Constraint

11/32

SLIDE 16

Example 3

◮ Model: (d, k)-RLL constrained system with noise

1

(1, ∞)-RLL Constraint

BSC(p) XN

1

YN

1

11/32

SLIDE 17

Example 3

◮ Model: (d, k)-RLL constrained system with noise

1

(1, ∞)-RLL Constraint

1 − α 1 α

state Markov chain

BSC(p) XN

1

YN

1

1/1

( 1 − α ) ( 1 − p )

1/0

(1 − α)(p)

0/0

1 ( 1 − p )

1/0

1(p)

0/0

α(1 − p)

0/1

α(p)

P(xi, yi, si|si−1)

11/32

SLIDE 18

Example 4

◮ Model: Lossy compression of a source with memory

YN

1

XN

1

Lossy compression

◮ Source distribution:

Ps(y) , s ∈ S

◮ State transition:

π(si|si−1)

◮ Distortion: test channel P(x|y) ◮ Distribution:

P(xi, yi, si|si−1) = π(si|si−1)Psi(yi)P(xi|yi)

12/32

SLIDE 19

Polar codes: [S

¸as ¸o˘ glu:11], [S ¸as ¸o˘ gluTal:16], [ShuvalTal:17]

◮ Setting: Process (Xi, Yi, Si)N i=1 with memory, as above ◮ Hidden state: State unknown to encoder and decoder ◮ Polar transform: UN 1 = XN 1 · GN

UN

1 are neither independent, nor identically distributed ◮ Index sets: Fix β < 1/2

Low entropy: ΛN =

i : Perror(Ui|Ui−1

1

, YN

1 ) < 2−Nβ

High entropy: ΩN =

i : Perror(Ui|Ui−1

1

, YN

1 ) > 1

2 − 2−Nβ

◮ Polarization:

lim

N→∞

1 N |ΛN| = 1 − H⋆(X|Y) lim

N→∞

1 N |ΩN| = H⋆(X|Y) limN→∞ 1

NH(XN 1|YN 1 )

13/32

SLIDE 20

Achievable rate

◮ Achievable rate: In all examples, R approaches

I⋆(X; Y) = lim

N→∞

1 NI(XN

1; YN 1 ) ◮ Successive cancellation: [Wang+:15]

14/32

SLIDE 21

Mixing

Consider the process (Xi, Yi) — hidden state X1 X2 · · · XL XL+1 · · · XM XM+1 XM+2 · · · XN Y1 Y2 · · · YL YL+1 · · · YM YM+1 YM+2 · · · YN Then, there exist ψ(k), k ≥ 0, such that PXL

1,YL 1,XN M+1,YN M+1 ≤ ψ(M − L) · PXL 1,YL 1 · PXN M+1,YN M+1

where:

◮ ψ(0) < ∞ ◮ ψ(k) → 1

15/32

SLIDE 22

Three parameters

◮ Joint distribution P(x, y) ◮ For simplicity: X ∈ {0, 1} ◮ Parameters: ◮ Connections:

H ≈ 0 ⇐ ⇒ Z ≈ 0 ⇐ ⇒ K ≈ 1 H ≈ 1 ⇐ ⇒ Z ≈ 1 ⇐ ⇒ K ≈ 0 H(X|Y) = −

x,y P(x, y) log P(x|y)

Z(X|Y) = 2

y

P(0, y)P(1, y)

K(X|Y) =

y |P(0, y) − P(1, y)|

Entropy Bhattacharyya T.V. distance

16/32

SLIDE 23

Three processes

For n = 1, 2, . . .

◮ N = 2n ◮ UN 1 = XN 1GN ◮ Pick Bn ∈ {0, 1} uniform, i.i.d. ◮ Random index from {1, 2, . . . , N}

i = 1 + B1 B2 · · · Bn2

◮ Processes:

{Xi, Yi, Si} {Xi, Yi} Hn = H(Ui|Ui−1

1

, YN

1 )

Zn = Z(Ui|Ui−1

1

, YN

1 )

Kn = K(Ui|Ui−1

1

, YN

1 )

Entropy Bhattacharyya T.V. distance

17/32

SLIDE 24

Proof — memoryless case

Slow polarization

Hn ∈ (ǫ, 1 − ǫ) |Hn+1 − Hn| > 0

Fast polarization

Zn+1 ≤

2Zn

Bn+1 = 0 Z2

n

Bn+1 = 1 1 N|ΛN| − − − →

n→∞ 1 − H(X1|Y1)

Kn+1 ≤

K2

n

Bn+1 = 0 2Kn Bn+1 = 1 1 N|ΩN| − − − →

n→∞ H(X1|Y1)

New

Low entropy set High entropy set 18/32

SLIDE 25

Proof — memory✟✟

✟ ❍❍ ❍

less case

Slow polarization

Hn ∈ (ǫ, 1 − ǫ) |Hn+1 − Hn| > 0

Fast polarization

Zn+1 ≤

2ψZn

Bn+1 = 0 ψZ2

n

Bn+1 = 1 1 N|ΛN| − − − →

n→∞ 1 − H⋆(X|Y)

ˆ Kn+1 ≤

ψˆ

K2

n

Bn+1 = 0 2ˆ Kn Bn+1 = 1 1 N|ΩN| − − − →

n→∞ H⋆(X|Y)

{Xi, Yi, Si} {Xi, Yi}

ψ = ψ(0) = max

s

1 π(s) π: stationary state distribution

Low entropy set High entropy set 19/32

SLIDE 26

Notation

◮ Two consecutive blocks: (XN 1, YN 1 ) and (X2N N+1, Y2N N+1). ◮ Polar transform:

UN

1 = XN 1 · GN

VN

1 = X2N N+1 · GN ◮ Random index:

i = 1 + B1 B2 · · · Bn2

◮ Notation:

Qi = (Ui−1

1

, YN

1 )

Ri = (Vi−1

1

, Y2N

N+1)

20/32

SLIDE 27

Slow polarization

◮ Hn is a supermartingale

Hn = H(Ui|Qi) = H(Vi|Ri) Hn+1 =

H(Ui + Vi|Qi, Ri)

Bn+1 = 0 H(Vi|Ui + Vi, Qi, Ri) Bn+1 = 1 By the chain rule: E [Hn+1|Hn, . . .] = 1 2

H(Ui + Vi|Qi, Ri) + H(Vi|Ui + Vi, Qi, Ri)
= 1

2H(Ui + Vi, Vi|Qi, Ri) = 1 2H(Ui, Vi|Qi, Ri) ≤ 1 2H(Ui|Qi) + 1 2H(Vi|Ri) = Hn

UN

1 = XN 1 · GN

VN

1 = X2N N+1 · GN

Qi = (Ui−1

1

, YN

1 )

Ri = (Vi−1

1

, Y2N

N+1) 21/32

SLIDE 28

Slow polarization

Convergence

◮ Hn is a supermartingale ◮ 0 ≤ Hn ≤ 1

Hn converges a.s. and in L1 to H∞

Polarization

◮ H∞ ∈ [0, 1] ◮ We need: H∞ ∈ {0, 1} ◮ Easy if (UN 1, QN 1) and (VN 1, RN 1) were independent ◮ They are not: YN ∈ QN 1 and YN+1 ∈ RN 1 ◮ But: for almost all i, we have I(Ui; Vi|Qi, Ri) < ǫ ◮ Enough? No. Need to show that Qi and Ri can’t cooperate

to stop polarization

UN

1 = XN 1 · GN

VN

1 = X2N N+1 · GN

Qi = (Ui−1

1

, YN

1 )

Ri = (Vi−1

1

, Y2N

N+1) 22/32

SLIDE 29

Fast polarization to low entropy set ΛN

◮ Recall:

PXN

1,YN 1,X2N N+1,Y2N N+1 ≤ ψ · PXN 1,YN 1 · PX2N N+1,Y2N N+1

◮ “Force” block independence:

(˜ X2N

1 , ˜

Y2N

1 ) ∼ PXN

1 ,YN 1 · PX2N N+1,Y2N N+1

◮ Thus,

PXN

1,YN 1,X2N N+1,Y2N N+1 ≤ ψ · P˜

XN

1,˜

YN

1,˜

X2N

N+1,˜

Y2N

N+1

◮ With ˜

Ui, ˜ Vi, ˜ Qi, ˜ Ri as above PUN

1,QN 1,VN 1,RN 1 ≤ ψ · P˜

UN

1,˜

QN

1,˜

VN

1,˜

RN

1

UN

1 = XN 1 · GN

VN

1 = X2N N+1 · GN

Qi = (Ui−1

1

, YN

1 )

Ri = (Vi−1

1

, Y2N

N+1) 23/32

SLIDE 30

Polarization of ZN

Z(Ui + Vi|Qi, Ri) = 2 ·

q,r
PUi+Vi,Qi,Ri(0, q, r) · PUi+Vi,Qi,Ri(1, q, r)

≤ 2 ·

q,r
ψP˜

Ui+˜ Vi,˜ Qi,˜ Ri(0, q, r) · ψP˜ Ui+˜ Vi,˜ Qi,˜ Ri(1, q, r)

= ψ · Z(˜ Ui + ˜ Vi|˜ Qi, ˜ Ri) ≤ ψ · 2Z(˜ Ui|˜ Qi) = ψ · 2Z(Ui|Qi) In a similar manner, we show Z(Vi|Ui + Vi, Qi, Ri) ≤ ψ · Z(Ui|Qi)2

24/32

SLIDE 31

Fast polarization to high entropy set ΩN

◮ Memoryless case:

◮ Proof hinges on independence:

P(x2N

1 , y2N 1 ) = P(xN 1, yN 1) · P(x2N N+1, y2N N+1) ◮ Memory case:

◮ Force independence: condition on middle state SN

P(x2N

1 , y2N 1 |sN) = P(xN 1, yN 1|sN) · P(x2N N+1, y2N N+1|sN) ◮ New processes:

ˆ Hn = H(Ui|Ui−1

1

, YN

1 , S0, SN)

ˆ Kn = K(Ui|Ui−1

1

, YN

1 , S0, SN)

25/32

SLIDE 32

Tying things together

H∞ Zn ΛN

26/32

SLIDE 33

Tying things together

H∞ Zn ΛN =

a.s. ˆ

H∞ ˆ Kn ΩN

26/32

SLIDE 34

Polarization of Kn (memoryless case)

◮ Memoryless assumption:

P(ui, vi, qi, ri) = P(ui, qi) · P(vi, ri)

◮ Notation:

Ti = Ui + Vi

◮ One step polarization:

Kn+1 =

K(Ti|Qi, Ri)

Bn+1 = 0 ‘−’ transform K(Vi|Ti, Qi, Ri) Bn+1 = 1 ‘+’ transform

◮ Recall:

K(X|Y) =

y

|P(0, y) − P(1, y)|

UN

1 = XN 1 · GN

VN

1 = X2N N+1 · GN

Qi = (Ui−1

1

, YN

1 )

Ri = (Vi−1

1

, Y2N

N+1) 27/32

SLIDE 35

Polarization of Kn (memoryless case), ‘−’ transform

Kn+1 =

q,r

|PTi,Qi,Ri(0, q, r) − PTi,Qi,Ri(1, q, r)| =

q,r
1
v=0

P(v, r)(P(v, q) − P(v + 1, q))

=
q,r
P(0, q) − P(1, q)
P(0, r) − P(1, r)
=
q,r

|P(0, q) − P(1, q)| · |P(0, r) − P(1, r)| =

q

|P(0, q) − P(1, q)| ·

r

|P(0, r) − P(1, r)| = K2

n,

28/32

SLIDE 36

Polarization of Kn (memoryless case), ‘+’ transform

Kn+1 =

t,q,r
PTi,Vi,Qi,Ri(t, 0, q, r) − PTi,Vi,Qi,Ri(t, 1, q, r)
=
t,q,r

|P(t, q)P(0, r) − P(t + 1, q)P(1, r)|

(∗)

≤ 1 2

t,q,r

P(q) |P(0, r) − P(1, r)| + P(r) |P(t, q) − P(t + 1, q)| = 1 2

t,r

|P(0, r) − P(1, r)| + 1 2

t,q

|P(t, q) − P(t + 1, q)| = 2Kn, Identity for (∗): For any a, b, c, d: ab − cd = (a + c)(b − d) + (b + d)(a − c) 2

29/32

SLIDE 37

Polarization of ˆ Kn (memory)

◮ Follows steps of memoryless case ◮ Requires additional inequalities

◮ Inequality I: For states s0, sN, s2N ∈ S,

P(s0, sN, s2N) = P(s0, sN) · P(sN, s2N) P(sN) ≤ ψ · P(s0, sN) · P(sN, s2N) where ψ = max

s

1 π(s)

◮ Inequality II: For f, g ≥ 0,

sN

f(sN)g(sN) ≤

sN

f(sN)

s′

N

g(s′

N)

30/32

SLIDE 38

Connections

Extreme Values

H ≈ 0 ⇔ Z ≈ 0 ⇔ K ≈ 1 H ≈ 1 ⇔ Z ≈ 1 ⇔ K ≈ 0 also for ˆ ( · ) processes

Ordering

ˆ Hn ≤ Hn ˆ Zn ≤ Zn ˆ Kn ≥ Kn All six processes (Hn, ˆ Hn, Zn, ˆ Zn, Kn, ˆ Kn) polarize fast both to 0 and 1 with any β < 1/2

31/32

SLIDE 39

Summary

◮ A general framework for memory:

P(xi, yi, si|si−1)

◮ Memory allowed in both source and channel ◮ State sequence Si ◮ Hidden ◮ Stationary ◮ Finite state Markov ◮ Aperiodic and irreducible

◮ Achieve rate I⋆(X; Y) through polar codes ◮ No change to polarization exponent (β < 1/2)

32/32