SLIDE 1 1 / 14
An Upgrading Algorithm with Optimal Power Law
Or Ordentlich1 Ido Tal2
1Hebrew University 2Technion
SLIDE 2
2 / 14
Big picture first
In this talk:
◮ An upgrading algorithm for channels with non-binary input ◮ Optimal power law ◮ Achieved by reduction to the binary-input case ◮ Important for constructing polar codes
SLIDE 3 3 / 14
Constructing vanilla polar codes
◮ Underlying channel: a binary-input symmetric and memoryless
channel W : X → Y, where X = {0, 1}
◮ Derive N = 2n synthetic channels W (n) j
: X → YN × X j−1, where 1 ≤ j ≤ N.
◮ Constructing a vanilla polar code ≡ finding which synthetic
channels W (n)
j
are ‘almost noiseless’
◮ Problem: output alphabet YN × X j−1 is intractably large ◮ Solution:
◮ Replace W (n)
j
with Q(n)
j
having output alphabet size L
◮ Have Q(n)
j
be (stochastically) degraded with respect to W (n)
j
W (n)
j
Φ Q(n)
j
input intractably large
◮ Q(n)
j
almost noiseless = ⇒ W (n)
j
almost noiseless
SLIDE 4
4 / 14
Constructing vanilla polar codes
◮ We write Q ≤ W if Q is degraded with respect to W ◮ Alternatively, we write W ≥ Q and say that W is upgraded
with respect to Q
◮ Previous slide:
Q(n)
j
≤ W (n)
j ◮ We can also approximate W (n) j
“from above” by an upgraded channel R(n)
j
having output alphabet size at most L.
◮ Sandwich property:
Q(n)
j
≤ W (n)
j
≤ R(n)
j ◮ In vanilla setting, R(n) j
has secondary importance. . .
SLIDE 5 5 / 14
Constructing generalized polar codes
◮ Polar codes have been generalized beyond vanilla setting
◮ Asymmetric channels (with asymmetric input distribution) ◮ Wiretap channels ◮ Channels with memory (input distribution can have memory as
well)
◮ In all these settings upgrading is as important as degrading for
constructing the code
◮ For settings with memory, the “effective input alphabet” is
non-binary
SLIDE 6 6 / 14
Problem statement
◮ Given: joint distribution of channel and input PX,Y (x, y)
◮ x ∈ X, the input alphabet and y ∈ Y, the output alphabet ◮ PX,Y (x, y) =
PX(x)
input distribution
· PY |X(y|x)
◮ Find: P∗ X,Z,Y (x, z, y) such that
◮ Marginalization:
z P∗ X,Z,Y (x, z, y) = PX,Y (x, y)
◮ Upgrading: X − Z − Y is a Markov chain ◮ Tractable output alphabet size: z ∈ Z and |Z| ≤ L
R Φ W X Z Y W X Y = ⇒
◮ Figure of merit:
H(X|Y ) − H(X|Z) = I(X; Z) − I(X; Y ) should be ‘small’
SLIDE 7 7 / 14
Power law
◮ Previous results:
◮ Recall: output alphabet size of upgraded channel |Z| ≤ L ◮ There exists a ‘hard to upgrade’ joint distribution P(X, Y ):
H(X|Y ) − H(X|Z) = Ω(L−2/(|X|−1))
◮ For binary input, |X| = 2, and any PX,Y , there exists an
upgrading algorithm such that H(X|Y ) − H(X|Z) = O(L−2) = O(L−2/(|X|−1))
◮ New result:
◮ Also for non-binary input, we can upgrade any PX,Y and
achieve H(X|Y ) − H(X|Z) = O(L−2/(|X|−1))
◮ Main idea: use binary-input as a black-box (reduction)
SLIDE 8
8 / 14
One-hot representation
◮ Denote q = |X|. Assume
X = {1, 2, . . . , q}
◮ For x ∈ X, define
g(x) = (x1, x2, . . . , xq−1) , the one-hot representation: g(1) = (1, 0, 0 . . . 0, 0) g(2) = (0, 1, 0 . . . 0, 0) . . . g(q − 1) = (0, 0, 0 . . . 0, 1) g(q) = (0, 0, 0 . . . 0, 0)
◮ Abuse notation and write x = g(x) = (x1, x2, . . . , xq−1)
SLIDE 9
9 / 14
PX,Y = ⇒ α(i) = ⇒ β(i) = ⇒ γ(i) = ⇒ P∗
X,Z,Y
◮ We are given PX,Y , where |X| = q ◮ Need to produce P∗ X,Z,Y by reducing to binary-input upgrading ◮ Denote X ′ = {0, 1} ◮ Let X = (X1, X2, . . . , Xq−1) and Y be distributed according to
PX,Y
◮ First step: define, for 1 ≤ i ≤ q − 1 the joint distribution
α(i)
Xi,Y (x′, y) = P(Xi = x′, Y = y|X i−1 1
= 0i−1
1
)
◮ The joint distribution α(i) Xi,Y (x′, y) has binary input, x′ ∈ X ′ ◮ We may apply our binary-input upgrading procedure
SLIDE 10 10 / 14
PX,Y = ⇒ α(i) = ⇒ β(i) = ⇒ γ(i) = ⇒ P∗
X,Z,Y
◮ Recall our binary-input joint distribution: for 1 ≤ i ≤ q − 1,
α(i)
Xi,Y (x′, y) = P(Xi = x′, Y = y|X i−1 1
= 0i−1
1
)
◮ Define
Λ =
.
◮ Second step:
◮ Apply our binary-input upgrading procedure to α(i)
Xi,Y (x′, y),
resulting in β(i)
Xi,Zi,Y (x′, z, y) ,
where |Zi| ≤ Λ
◮ Difference in entropies is O(Λ−2)
SLIDE 11 11 / 14
PX,Y = ⇒ α(i) = ⇒ β(i) = ⇒ γ(i) = ⇒ P∗
X,Z,Y
◮ Recall that we have produced β(i) Xi,Zi,Y (x′, z, y), where x′ ∈ X ′
is binary
◮ Third step: define the conditional distribution
γ(i)
Xi|Zi,X i−1
1
(xi|zi, xi−1
1
) = β(i)
Xi|Zi(xi|zi)
if xi−1
1
= 0i−1
1
, 1 if xi−1
1
= 0i−1
1
and xi = 0 ,
◮ That is, if xi−1 1
is non-zero, force xi to zero, in accordance with the one-hot representation
◮ Otherwise, if xi−1 1
is zero, use β(i)
Xi|Zi
SLIDE 12 12 / 14
PX,Y = ⇒ α(i) = ⇒ β(i) = ⇒ γ(i) = ⇒ P∗
X,Z,Y
◮ Last step: define
P∗
X,Z,Y (x, z, y) = PY (y) ·
q−1
β(i)
Zi|Y (zi|y)
q−1
γ(i)
Xi|Zi,X i−1
1
(xi|zi, xi−1
1
)
- ◮ A valid upgrade, with optimal power law:
H(X|Y ) − H(X|Z) = O(L−2/(|X|−1))
SLIDE 13 13 / 14
A graphical description of PX,Y
∼ Y α(1)
X1|Y
˜ X1 f1( ˜ X1) X1 α(i)
Xi|Y
˜ Xi fi( ˜ Xi
1)
Xi α(q−1)
Xq−1|Y
˜ Xq−1 fq−1( ˜ Xq−1
1
) Xq−1 . . . . . . . . . . . .
where fi(˜ xi
1) ˜
xi · 1{˜
xi−1
1
=0i−1
1
}
SLIDE 14 14 / 14
A graphical description of PX,Z,Y
∼ Y β(1)
Z1|Y
Z1 β(1)
X1|Z1
˜ X1 f1( ˜ X1) X1 β(i)
Zi|Y
Zi β(i)
Xi|Zi
˜ Xi fi( ˜ Xi
1)
Xi β(q−1)
Zq−1|Y
Zq−1 β(q−1)
Xq−1|Zq−1
˜ Xq−1 fq−1( ˜ Xq−1
1
) Xq−1 . . . . . . . . . . . . . . . . . .
where fi(˜ xi
1) ˜
xi · 1{˜
xi−1
1
=0i−1
1
}