Improving Spark Performance with Zero-copy Buffer Management and RDMA
Hu Li, Charley Chen and Wei Xu Institute for Interdisciplinary Information Sciences Tsinghua University, China
Improving Spark Performance with Zero-copy Buffer Management and - - PowerPoint PPT Presentation
Improving Spark Performance with Zero-copy Buffer Management and RDMA Hu Li, Charley Chen and Wei Xu Institute for Interdisciplinary Information Sciences Tsinghua University, China Latency matters in big data Impala Query Dremel Query [2012]
Hu Li, Charley Chen and Wei Xu Institute for Interdisciplinary Information Sciences Tsinghua University, China
[Kay@SOSP13] Big Data: Not only capable , but also interactively
Impala Query [2012]
10 sec 10 min
Dremel Query [2010] Hive Query [2009] In-memory Spark Query [2010]
100 ms
MapReduce Batch Job [2004]
1 ms
Spark Streaming [2013]
Job Latencies
advantage of the RDMA over Converged Ethernet (RoCE) fabric
advantage of RDMA more efficiently
while staying fully compatible with the off-the-shelf Spark
Lower CPU utilization and lower latency
Object
RNIC RNIC JVM heap JVM off-heap JVM heap JVM off-heap Machine A Machine B User Space Executor Executor serialization
Byte Array Object
RNIC RNIC JVM heap JVM off-heap JVM heap JVM off-heap Machine A Machine B User Space Executor Executor serialization
Byte Array
DMA Read
Byte Array
DMA Write Network transfer
Object
RNIC RNIC JVM heap JVM off-heap JVM heap JVM off-heap Machine A Machine B User Space Executor Executor serialization
Byte Array
DMA Read
Byte Array
DMA Write Network transfer
Object
deserialization
JVM Heap JVM Off-heap
Object Byte Array Serialize
Traditional Way
Byte Array Network API (Copy)
Kernel Space
Byte Array System call (Copy) Object Byte Array Serialize
Our Way
RNIC
DMA READ
JVM Heap JVM Off-heap
Thread 1 BlockManager BlockTransferService(TCP) SendingConnections ReceivingConnections
Executor(Spark)
Thread 2 Thread N
…
Thread 1 BlockManager BlockTransferService(RDMA) SendingConnections ReceivingConnections
Executor(NetSpark)
Thread 2 Thread N
…
BufferManager
Simple solution: Pre-allocate RDMA buffer space to avoid allocation / register overhead
buffers
and OutputStream to take advantage of the new buffer manager
Switch Switch
10Gb Ethernet 3 X 40Gb Ethernet
… … …
Sever
Network topology of our testbed
for RDMA to avoid packets loss
Compared four different executor implementation
(Spark version: 1.5.0) max min 50 25 75 latency
About 17% improvement
A larger dataset about 107.3GB for shuffle ~40% faster over Netty
Twitter Graph Dataset
[Kwak@www2010]
41million nodes 1.5 billion edges 20% faster than Netty 10% faster than naive RDMA
advantage of the RDMA over Converged Ethernet (RoCE) fabric
advantage of RDMA more efficiently
staying fully compatible with the off-the-shelf Spark Wei Xu weixu@tsinghua.edu.cn