Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models
Suli Yang*, Jing Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
* work done while at UW-Madison
Principled Schedulability Analysis for Distributed Storage Systems - - PowerPoint PPT Presentation
Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models Suli Yang*, Jing Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau * work done while at UW-Madison Scheduling: A Fundamental Primitive
Suli Yang*, Jing Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
* work done while at UW-Madison
N S
snapchat
A R/W R/W R/W R/W E
Shared Storage
A A A S N E
2
[MongoDB - #21858]:
“A high throughput update workload … could cause starvation on secondary reads”
[HBase - #8884]:
“ …when the read load is high on a specific RS is high, the write throughput also get impacted dramatically, and even write data loss...”
[Cassandra - #10989]:
“inability to balance writes/reads/compaction/flushing…”
etc.
3
We introduce Thread Architecture Model to describe scheduling complexities
4
understandable and analyzable model
Packet Ack
Ack Process
Data Xceive
r2 f1 a3
Packet Ack
w6 w7
w
3w7
w
2w4
a1
w4
w
5w
1w
5w
3r1 r2 w
5w
2w
3a2
LOG Sync Mem Flush
2 1
RPC Handle
RegionServer/DataNode
5
6
Tamed-HBase
7
enables principled schedulability analysis
8
9
Packet Ack
Ack Process
Data Xceive
r2 f1 a3
Packet Ack
w6 w7
w
3w7
w
2w4
a1
w4
w
5w
1w
5w
3r1 r2 w
5w
2w
3a2
LOG Sync Mem Flush
2 1
RPC Handle
RegionServer/DataNode
3 w 2
Name C N
I L
stage (threads performing similar tasks)
Name
I/O network Lock
resource usage request flow request queue (scheduling point) blocking
11
12
13
Req Handle
15
Mutation V
...
Read Mutation V
1
...
2 3 3 3 4 4 4 1 5 6 7
l1 l2
6 7 3 3 3 4 4 4 5
l2
8
Cassandra Node Cassandra Node
5
l1
16
Workload:
C1: issues cold requests C2: issues cold and cached requests
Expectation:
C2 has much higher throughput (due to cached request)
CPU underutilized
17
Workload:
C1: issues cold requests C2: issues cold and cached requests
Expectation:
C2 has much higher throughput (due to cached request)
18
begins
Req Handle
Batcher
Fetcher
1 2 3 4 5 6 7 8
Primary Node Secondary Node
8 1
MongoDB
20
MongoDB
Workload: C1: reads from primary (does not go to secondary) C2: writes to primary (replicate to secondary node) time 10: the secondary node slows down Expectation: C1 reads throughput remains stable
Time (s)
21
Workload: C1: reads C2: writes (replicate to secondary node) time 10: the secondary node slows down Expectation: C1 reads throughput remains stable
MongoDB
threads block
Req Handle
24
25
Packet Ack
Ack Process
Data Xceive
r2 f1 a3
Packet Ack
w6 w7
w
3w7
w
2w4
a1
w4
w
5w
1w
5w
3r1 r2 w
5w
2w
3a2
LOG Sync Mem Flush
2 1
RPC Handle
RegionServer/DataNode
RegionServer/DataNode
Packet Ack Ack Process Packet Ack
a1
CPU
IO
Network
Data Xceive
RPC Respond
26
27
28
Workloads: Five clients, each with different weight , run YCSB (reads mostly) Expectation: Client receives throughput proportional to weight
29
Workloads: Five clients, each with different weight , run YCSB (reads mostly) Expectation: Client receives throughput proportional to weight
30
Workloads: Foreground client: runs YCSB (update-heavy) Background client: random Gets or Puts Expectation: Foreground latency remains stable
31
Workloads: Foreground client: runs YCSB Background client: random Gets or Puts Expectation: Foreground latency remains stable
32
Workloads: Foreground client: runs YCSB Background client: random Gets or Puts Expectation: Foreground latency remains stable
33
34
Geo-scale relational database behind Alipay 42,000,000 SQLs per second US and China based Contact OceanBase-Public@list.alibaba-inc.com
OceanBase微信 公众号
35