Parallel- 0 : A fully parallel algorithm for combinatorial - - PowerPoint PPT Presentation

parallel 0 a fully parallel algorithm for combinatorial
SMART_READER_LITE
LIVE PREVIEW

Parallel- 0 : A fully parallel algorithm for combinatorial - - PowerPoint PPT Presentation

Parallel- 0 : A fully parallel algorithm for combinatorial compressed sensing Jared Tanner & Rodrigo Mendoza-Smith 2. Int. Matheon Conf. on CS and its applications 2015 7 th Dec. 2015 University of Oxford 1 Joint with


slide-1
SLIDE 1

Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

Jared Tanner & Rodrigo Mendoza-Smith

  • 2. Int. Matheon Conf. on CS and its applications 2015

7th Dec. 2015 ———– University of Oxford1 Joint with Rodrigo Mendoza-Smith

1Supported by: EPSRC, NVIDIA, & SELEX-Galileo Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-2
SLIDE 2

Combinatorial Compressed Sensing (CCS)

◮ Let A ∈ Rm×n, x ∈ χn k := {x ∈ Rn : x0 ≤ k},. ◮ Compressed sensing looks for the solution, with k < m < n, of

y = Ax s.t. x ∈ χn

k. ◮ Most CS theory developed for A Gaussian or Partial Fourier

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-3
SLIDE 3

Combinatorial Compressed Sensing (CCS)

◮ Let A ∈ Rm×n, x ∈ χn k := {x ∈ Rn : x0 ≤ k},. ◮ Compressed sensing looks for the solution, with k < m < n, of

y = Ax s.t. x ∈ χn

k. ◮ Most CS theory developed for A Gaussian or Partial Fourier

Ensemble Storage Generation ATy m Gaussian O(mn) O(mn) O(mn) O(k log(n/k)) Partial Fourier O(m) O(n) O(n log(n)) O(k log5(n)) Expander O(dn) O(dn) O(dn) O(k log(n/k))

◮ In CCS A is an expander matrix, i.e. a sparse binary matrix

with d << m ones per column (A ∈ Ek,ε,d).

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-4
SLIDE 4

Expander matrices: A ∈ Ek,ε,d, some notation

Edges of expander graph: [n] = {1, 2, . . . , n}, or [m] Neighbours of vertices X, N(X), are vertices connected by an edge

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-5
SLIDE 5

Expander matrices: A ∈ Ek,ε,d, some notation

Edges of expander graph: [n] = {1, 2, . . . , n}, or [m] Neighbours of vertices X, N(X), are vertices connected by an edge Aij = 1{i and j are connected}.

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-6
SLIDE 6

Expander matrices: A ∈ Ek,ε,d, some notation

Edges of expander graph: [n] = {1, 2, . . . , n}, or [m] Neighbours of vertices X, N(X), are vertices connected by an edge Aij = 1{i and j are connected}. ∃ ε ∈ (0, 1) s.t. |Γ(X)| = |N(X)| > (1 − ε)d|X| ∀ X ⊂ [n] with |X| ≤ k.

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-7
SLIDE 7

Expander matrices: A ∈ Ek,ε,d, some notation

Edges of expander graph: [n] = {1, 2, . . . , n}, or [m] Neighbours of vertices X, N(X), are vertices connected by an edge Aij = 1{i and j are connected}. ∃ ε ∈ (0, 1) s.t. |Γ(X)| = |N(X)| > (1 − ε)d|X| ∀ X ⊂ [n] with |X| ≤ k. d ≡ |N(j)| ∀ j ∈ [n]. A ∈ Rm×n is a sparse binary matrix with d << m ones per column

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-8
SLIDE 8

Structure of CCS Greedy Algorithms

Initialization: A ∈ Ek,ε,d; y ∈ Rm, ˆ x = 0, r = y while not converged Compute a score sj and an update ωj ∀ j ∈ [n] Select T ⊂ [n] based on a rule on sj ˆ xj ← ˆ xj + ωj for j ∈ T r ← y − Aˆ x

◮ CCS algorithms differ by their score metric sj and

how many elements T is allowed to contain

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-9
SLIDE 9

Overview of CCS Greedy Algorithms

Algorithm Score Concurrency Complexity SMP (EIHT) [1] ℓ1 / median parallel O((nd + n log n) log x1) SSMP [2] ℓ1 / median serial O(( d3n

m + n)k + (n log n) log x1)

LDDSR [3] / ER [4] ℓ0 / mode serial O(( d3n

m + n)k) Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-10
SLIDE 10

Overview of CCS Greedy Algorithms

Algorithm Score Concurrency Complexity SMP (EIHT) [1] ℓ1 / median parallel O((nd + n log n) log x1) SSMP [2] ℓ1 / median serial O(( d3n

m + n)k + (n log n) log x1)

LDDSR [3] / ER [4] ℓ0 / mode serial O(( d3n

m + n)k)

Serial-ℓ0 [5] ℓ0 / ℓ0 serial O(dn log k) Parallel-ℓ0 [5] ℓ0 / ℓ0 parallel O(dn log k)

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-11
SLIDE 11

Overview of CCS Greedy Algorithms

Algorithm Score Concurrency Complexity SMP (EIHT) [1] ℓ1 / median parallel O((nd + n log n) log x1) SSMP [2] ℓ1 / median serial O(( d3n

m + n)k + (n log n) log x1)

LDDSR [3] / ER [4] ℓ0 / mode serial O(( d3n

m + n)k)

Serial-ℓ0 [5] ℓ0 / ℓ0 serial O(dn log k) Parallel-ℓ0 [5] ℓ0 / ℓ0 parallel O(dn log k) ◮ Only SMP was observed to take less computational time than

non-combinatorial CS algorithms such as NIHT

◮ Unfortunately SMP only able to recovery x ∈ χn k for k/m ≪ 1 ◮ Parallel-ℓ0 computationally fast and recovery for k/m ≈ 0.3 ◮ Sudocodes is an alternative method, preprocessing to reduce n

by determining locations in x that must be zero

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-12
SLIDE 12

Decoding by decreasing rℓ0

Parallel-ℓ0 Initialization: A ∈ Ek,ε,d; y ∈ Rm, α ∈ [d − 1], ˆ x = 0, r = y while not converged T ← {(j, ωj) ∈ [n] × R : r0 − r − ωjaj0 > α} for (j, ωj) ∈ T ˆ xj ← ˆ xj + ωj for j ∈ T r ← y − Aˆ x

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-13
SLIDE 13

Decoding by decreasing rℓ0

Parallel-ℓ0 Initialization: A ∈ Ek,ε,d; y ∈ Rm, α ∈ [d − 1], ˆ x = 0, r = y while not converged T ← {(j, ωj) ∈ [n] × R : r0 − r − ωjaj0 > α} for (j, ωj) ∈ T ˆ xj ← ˆ xj + ωj for j ∈ T r ← y − Aˆ x Serial-ℓ0 Initialization: A ∈ Ek,ε,d; y ∈ Rm, α ∈ [d − 1], ˆ x = 0, r = y while not converged for j ∈ [n] T ← {ωj ∈ R : r0 − r − ωjaj0 > α} ˆ xj ← ˆ xj + ωj for j ∈ T r ← y − Aˆ x

◮ Parallel-ℓ0: computing T and updating ˆ

x suitable for GPU

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-14
SLIDE 14

Theorem (Convergence of Expander ℓ0-Decoders)

Let A ∈ Ek,ε,d and ε < 1/4. and x ∈ χn

k be a dissociated signal.

Then, Serial-ℓ0 and Parallel-ℓ0 with α = (1 − 2ε)d can recover x from y = Ax ∈ Rm in O(dn log k) operations.

Dissociated: P

j∈T1 xj = P j∈T2 xj

∀ T1, T2 ⊂ supp(x) with T1 = T2

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-15
SLIDE 15

Theorem (Convergence of Expander ℓ0-Decoders)

Let A ∈ Ek,ε,d and ε < 1/4. and x ∈ χn

k be a dissociated signal.

Then, Serial-ℓ0 and Parallel-ℓ0 with α = (1 − 2ε)d can recover x from y = Ax ∈ Rm in O(dn log k) operations.

Dissociated: P

j∈T1 xj = P j∈T2 xj

∀ T1, T2 ⊂ supp(x) with T1 = T2

◮ Dissociation, the same signal model as consider by sudocodes. ◮ Parallel-ℓ0 requires log k iterations of complexity O(dn)

complexity, each of which is trivially decomposed into n independent tasks of complexity O(d).

◮ Serial-ℓ0 requires n log k iterations of complexity O(d). ◮ Serial-ℓ0 is faster than Parallel-ℓ0 if both run on a single core,

but Parallel-ℓ0 substantially faster when run on high performance computing GPUs with thousands of cores.

◮ Serial-ℓ0 and Parallel-ℓ0 have nearly identical recovery regions.

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-16
SLIDE 16

Improved phase transition

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

← smp ← ssmp ← er ← parallel_lddsr ← parallel_l0 ← serial_l0 ← l1−regularization

50% phase transition curve for d = 7 with n = 218

δ=m/n ρ=k/m

◮ Greater recovery region than other CCS algorithms ◮ No apparent decrease in phase transition for m ≪ n

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-17
SLIDE 17

Fastest CS algorithm for A ∈ Ek,ε,d

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

parallel−l0:plus parallel−lddsr:asterisk CGIHT:diamond CGIHTprojected:hexagram CGIHTrestarted:up−triangle CSMPSP:down−triangle FIHT:right−triangle HTP:left−triangle

δ=m/n ρ=k/m

Algorithm selection map for d = 7 with n = 218

◮ Parallel-ℓ0 and Parallel-LDDSR fastest when convergent ◮ First examples of CCS algorithms being state-of-the-art

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-18
SLIDE 18

Average timing for fixed m/n = 1/100

0.05 0.1 0.15 0.2 0.25 0.3 0.35 10

−3

10

−2

10

−1

10 10

1

10

2

10

3

10

4

rho = k/m Average time (sec) Mean time to exact convergence m/n = 0.01 with n = 220 and d = 7 er parallel−l0 parallel−lddsr serial−l0 smp ssmp

◮ Less computational time over Parallel-LDSR, all but k/m ≪ 1 ◮ Near constant speedup of Parallel-ℓ0 over Serial-ℓ0

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-19
SLIDE 19

High phase transition persists for m/n ≪ 1

0.05 0.1 0.15 0.2 0.25 0.3 0.35 10

−3

10

−2

10

−1

10 10

1

10

2

10

3

10

4

rho = k/m Average time (sec) Mean time to exact convergence for parallel−l0 m/n = 10−3 and d = 7 n = 222 n = 224 n = 226

◮ Recovery ability of m ≈ 3k even for m = n × 10−3, ◮ Time for n ≈ 67 million in under 2 seconds

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-20
SLIDE 20

Sketch of the complexity proof:

Lemma (Bounded frequency of values in expander measurements of dissociated signals)

Let x ∈ χn

k be dissociated, A ∈ Ek,ε,d, and ω a nonzero value in

  • Ax. Then, there is a unique set T ⊂ supp(x) such that

ω =

j∈T xj and the value ω occurs in y at most d times,

|{i ∈ [m] : yi = ω}| ≤ d ∀ ω = 0. Proof: The uniqueness of the set T ⊂ supp(x) such that ω =

j∈T xj follows by the definition of dissociated. Since

|N(j)| = d for all j ∈ [n], we have that, |{i ∈ [m] : yi = ω}| =

  • j∈T

N(j)

  • ≤ |N(j0)| = d

for any j0 ∈ T.

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-21
SLIDE 21

Lemma (Pairwise column overlap)

Let A ∈ Ek,ε,d. If ε < 1/4, every pair of columns of A intersect in less than (1 − 2ε)d rows, that is, for all j1, j2 ∈ [n] with j1 = j2

  • N(j1)
  • N(j2)
  • < (1 − 2ε)d.

Proof: Let S ⊂ [n] be such that |S| = 2 then |N(S)| ≥ 2(1 − ε)d > 2d − (1 − 2ε)d, where the first inequality is definition of A ∈ Ek,ε,d and the second inequality follows from ǫ < 1/4.

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-22
SLIDE 22

Lemma (Support identification)

Let y = Ax for dissociated x ∈ χn

k and A ∈ Ek,ε,d with ε < 1/4.

Let ω = 0 be such that |{i ∈ N(j) : yi = ω}| > (1 − 2ε)d, (1) then ω = xj. Proof: Our claim is that for any ω which is a nonzero value from y, if the cardinality condition (1) is satisfied then the value ω =

j∈T xj occurs for the set T being a singleton, |T| = 1.

Frequency lemma states that T is unique and that |{i ∈ N(j) : yi = ω}| =

  • j∈T

N(j)

  • .

If |T| > 1 then the above is not more than the intersection of any two of the sets N(j1) and N(j2), and by pairwise column overlap lemma, is less than (1 − 2ε)d which contradicts the cardinality condition (1) and consequently |T| = 1 and ω = xj.

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-23
SLIDE 23

Theorem (Convergence rate of Parallel-ℓ0)

Let A ∈ Ek,ε,d and let ε < 1/4, and x ∈ χn

k be dissociated. Then,

Parallel-ℓ0 with α = (1 − 2ε)d can recover x from y = Ax ∈ Rm in O(log k) iterations of complexity O(dn). Sketch of proof: Let Tℓ be set T of vertices to update at iteration ℓ and Sℓ = supp(x − ˆ x). As A ∈ Ek,ε,d has d nonzeros per column, the reduction in the cardinality of the residual, say rℓ0 − rℓ+10, can be at most d|Tℓ|; rℓ0 − rℓ+10 ≤ d|Tℓ|. Reduction in residual bounded below by (non-obvious) rℓ0 − rℓ+10 ≥ α|Tℓ| + (|Sℓ| − |Tℓ|). Combining bounds ensures linear convergence |Sℓ+1| ≤ 2εd 1 + 2εd |Sℓ|

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-24
SLIDE 24

Summary

◮ Serial-ℓ0 and Parallel-ℓ0 recovery for A ∈ Ek,ε,d in complexity

O(dn log k) and observed to take less time than non-CCS algorithms

◮ Recovery observed, for n large enough, with m ≈ 3k ◮ Theory requires either x dissociated, or x drawn independent

  • f A and columns of A scaled by dissociated values

◮ Robustness to ℓ∞ bounded additive noise follows, but

unknown for other noise variants or compressible signals

◮ There are noise robustness techniques for sudocodes (Ma,

Baron, Needell 2014) which can be applied to ℓ0 decoders

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-25
SLIDE 25

Bibliography

[1] Radu Berinde, Piotr Indyk, and Milan Ruzic Practical near-optimal sparse recovery in the ℓ1 norm. Allerton Communication, control and computing 2008 [2] Radu Berinde and Piotr Indyk Sequential sparse matching pursuit. Allerton Communication, control and computing 2009 [3] Weiyu Xu and Babak Hassibi Efficient compressive sensing with deterministic guarantees using expander graphs. Information theory workshop, 2007. ITW’07. IEEE, pages 414-419 [4] Sina Jafarpour, Weiyu Xu, Babak Hassibi, Robert Calderbank Efficient and robust compressed sensing using high-quality expander graphs. arXiv preprint arXiv:0806.3802, 2008. [5] Rodrigo Mendoza-Smith and Jared Tanner Expander ℓ0-decoding

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing

slide-26
SLIDE 26

Alan Turing Institute (ATI): watch this space

◮ The UK recently (Nov. 2015) launched a new “Date Science”

centre

◮ Funded by 5 universities: Cambridge, Edinburgh, Oxford, UCL,

Warwick and EPSRC (Eng. Phy. Sci. Res. Council)

◮ Physical space in central London: British Library ◮ Currently has £77million budget for five years (growing)

◮ The scientific programme of the ATI is currently being

formed, based on a series of workshops between Oct. 2015 to

  • Feb. 2016.

◮ Currently advertising for Research Fellows (senior postdoc)

with initial 3 year appointment, possibly extended to 5 years.

◮ Five founding universities are advertising permanent (tenure

track) positions, including Oxford...

◮ Happy to answer any questions and hope to see you at the ATI

Thank you for your time

Jared Tanner & Rodrigo Mendoza-Smith Parallel-ℓ0: A fully parallel algorithm for combinatorial compressed sensing