What consistency guarantees should concurrent data structure - - PowerPoint PPT Presentation
What consistency guarantees should concurrent data structure - - PowerPoint PPT Presentation
What consistency guarantees should concurrent data structure libraries provide? Hans-J. Boehm hboehm@google.com Disclaimer This is half-baked and hand-wavy. Mostly questions, few answers. Reflects WG21/SG1 (C++ Concurrency) discussion. I may
Disclaimer
This is half-baked and hand-wavy. Mostly questions, few answers. Reflects WG21/SG1 (C++ Concurrency) discussion. I may well be misrepresenting some other work. If so, please correct me!
The problem
C++ (among others) would like to add more concurrent data structure libraries. e.g. a concurrent queue. We have to specify what correctness properties they have to satisfy. Serializability of operations on the data structure is
- Slightly too strong: construction and destructions isn’t allowed to race.
- Usually too weak: Operations over multiple data structures don’t interact
correctly. Linearizability doesn’t address all the issues.
Data structure interaction: Problem 1
Operations on x and y can be individually serializable, and the assertion can still fail. E.g. if x and y are represented by a single atomic and use memory_order_relaxed operations. That’s why we have linearizability ...
x.add(1); // 2nd y.add(1); // 1st Thread 1: if (!y.is_empty()) // 2nd assert(!x.is_empty()); // 1st Thread 2:
Interaction with ordinary data accesses: Problem 2
Can the accesses on x race? Operations on y can be individually serializable, but fail to ensure visibility for other memory accesses. Particularly important for e.g. threads communicating via a concurrent queue or work-stealing queue. Some careful academic papers (e.g. Batty, Dodds, Gotsman, POPL13), but often ignored.
x = 1; y.add(1); // 1st Thread 1: if (!y.is_empty()) // 2nd assert(x == 1); // 1st Thread 2:
So what’s hard here?
1) Linearizability definition relies on interleaving-based concurrency.
○ Doesn’t reflect modern mainstream memory models. ○ Meshes particularly badly with non-multi-copy-atomic architectures (Power, GPUs)? ○ In C++, there is no sequential execution history corresponding to a parallel execution.
2) No clear consensus about the right answers, particularly for the second problem. 3) Java typically guarantees that a data structure write “synchronizes with” a reader that sees the write. (Doug Lea’s approach)
○ Can be too strong or too weak.
This is an obstacle for concurrent data structures in C++.
Some examples
Queues (e.g. wg21.link/P0260, mostly Lawrence Crowl’s work)
Option 1: Doug Lea’s Java approach. Problem: Doesn’t mesh with “sequential consistency by default” philosophy. Notably (from wg21.link/P0387, cf. 2+2W litmus test): q.push(1); log.push("pushed 1");
Thread 1:
log.push("pushed 2"); q.push(2);
Thread 2:
log and q are both queues. log may contain “pushed 1”; “pushed 2” while q contains 2; 1 Good enough? Does it matter in practice?
Queues contd:
Option 2: Treat data structures like atomic values:
- All library data structure operations and atomic operations appear to execute
in a single total order.
- Writers synchronize with readers that see the result.
Note: still doesn’t prevent reordering of the assignments in x =rlx 1; q.push(...); y =rlx 1; Implicitly guaranteed by lock-based implementations. Is it fast enough? Does it affect implementations?
Counters
(wg21.link/P0261, also primarily L. Crowl is a much more elaborate proposal) Statistics counters: read only at end Don’t need memory ordering guarantees. If increments don’t return value, memory ordering is not observable. Statistics counters: concurrently read values readable for printing etc. Probably still don’t need memory ordering guarantees? Counter used as queue index Need acquire/release ordering. Counter (ab)used to implement a lock Need full sequential consistency
Possible approaches
C++ atomics library allows each operation to specify memory ordering guarantee.
- Very flexible.
- Interactions turned out to be subtle. (acq/rel vs SC)
- Does this make sense for general data structures? E.g. if you needed
sequential consistency everywhere, might you use a lock instead? Does it make sense to templatize the data structure w.r.t. memory_order? For counters, these distinctions matter. For queues are the cost differences too small to worry?
Likely WG21 approach
(May be my wishful thinking.) Where feasible, try the strong approach in a “Technical Specification”:
- Guarantees are initially modelled on SC atomics.
- Add relaxed versions with relaxed memory ordering (as well as possibly other
semantic relaxations) if performance issues arise. Otherwise try to hide weak ordering behind nondeterminism.