OpenCL-Based Erasure Coding
- n Heterogeneous Architectures
Guoyang Chen, Huiyang Zhou, Xipeng Shen, (North Carolina State University) Josh Gahm, Narayan Venkat, Skip Booth, John Marshall (Cisco Systems, Inc) Email: gchen11@ncsu.edu
1
OpenCL-Based Erasure Coding on Heterogeneous Architectures Guoyang - - PowerPoint PPT Presentation
OpenCL-Based Erasure Coding on Heterogeneous Architectures Guoyang Chen, Huiyang Zhou, Xipeng Shen, (North Carolina State University) Josh Gahm, Narayan Venkat, Skip Booth, John Marshall (Cisco Systems, Inc) Email: gchen11@ncsu.edu 1
Guoyang Chen, Huiyang Zhou, Xipeng Shen, (North Carolina State University) Josh Gahm, Narayan Venkat, Skip Booth, John Marshall (Cisco Systems, Inc) Email: gchen11@ncsu.edu
1
2
From google image
accelerate Reed-Solomon coding.
3
4
Dest = V × Src
5
𝐸𝑓𝑡𝑢 𝑚 𝑗 = 𝑊 𝑚 𝑘 × 𝑇𝑠𝑑 𝑘 [𝑗]
𝑡𝑠𝑑𝑡−1 𝑘=0
Dest = V × Src
6
𝐸𝑓𝑡𝑢 𝑚 𝑗 = 𝑊 𝑚 𝑘 × 𝑇𝑠𝑑 𝑘 [𝑗]
𝑡𝑠𝑑𝑡−1 𝑘=0
Dest = V × Src
lookup
7
8
9
same row).
10
==> Putting encode matrix and large look up table(64KB, for GF(28) Multiplication) in texture cache.
11
Dest = V × Src
==> Putting encode matrix and large look up table(64KB, for GF(28) Multiplication) in texture cache.
12
Dest = V × Src
13
H2D Compute D2H Stream 1 H2D Compute D2H Stream 2 H2D Compute D2H Stream 3 H2D Compute D2H Stream N ..... ...... ....... ......
14
byte)
15
16
workgroup.
chip memory bandwidth
17
18
19
20
0.5 1 1.5 2 2.5 3 20 40 60 80 100 120
Encode Bandwidth number of threads
GB/s
2.84 56
21
Encode Bandwidth
22
0.1 0.2 0.3 0.4 0.5 0.6 char int int4 char int int4 SVM Streaming GB/s Encode Bandwidth
about 3GB/s.
kernel throughput.
engine can be easily increased.
23
0.001 0.01 0.1 1 10 char int16 int16+tiling+unroll int int16 + tiling char int16 int16+tiling+unroll Large Table Small Table Russian Peasant
GB/s Encode Bandwidth
promising but needs to improve its current PCIe DMA interface.
24
1 2 3 4 5 6 7 8 10 15 20 25 30 srcs GPU FPGA MC-CPU ST-CPU GB/s
dests = srcs + 3
1 2 3 4 5 6 7 file1 file2 file1 file2 file1 file2 file1 file2 BDW+SVM BDW Arria10 StratixV
Encode BW (GB/s)
file 1 has a size of 29MB; file 2 has a size of 438MB BDW: Integrated FPGA (arria 10) on Xeon core. SVM (Shared Virtual Memory): the Map/unMap overhead is included Arria 10: discrete FPGA board through PCIe. Stratix V: discrete FPGA board through PCIe.
25
codes.
codes.
26