A Parallel Compact Hash Table Alfons Laarman & Steven van der - - PowerPoint PPT Presentation

a parallel compact hash table
SMART_READER_LITE
LIVE PREVIEW

A Parallel Compact Hash Table Alfons Laarman & Steven van der - - PowerPoint PPT Presentation

A Parallel Compact Hash Table Alfons Laarman & Steven van der Vegt Overview Research Motivation Background Contribution A Parallel Compact Hash Table October 3, 2011 2 / 19 Introduction Hash tables are fundamental data structures A


slide-1
SLIDE 1

A Parallel Compact Hash Table

Alfons Laarman & Steven van der Vegt

slide-2
SLIDE 2

Overview

Research Motivation Background Contribution

A Parallel Compact Hash Table October 3, 2011 2 / 19

slide-3
SLIDE 3

Introduction

◮ Hash tables are fundamental data structures

A Parallel Compact Hash Table October 3, 2011 3 / 19

slide-4
SLIDE 4

Introduction

◮ Hash tables are fundamental data structures ◮ Compact hash tables: memory efficient hash tables

A Parallel Compact Hash Table October 3, 2011 3 / 19

slide-5
SLIDE 5

Introduction

◮ Hash tables are fundamental data structures ◮ Compact hash tables: memory efficient hash tables ◮ Useful in i.e. Model checking, planning, BDDs, Tree tables

A Parallel Compact Hash Table October 3, 2011 3 / 19

slide-6
SLIDE 6

Introduction

◮ Hash tables are fundamental data structures ◮ Compact hash tables: memory efficient hash tables ◮ Useful in i.e. Model checking, planning, BDDs, Tree tables ◮ Problem: No concurrent implementation of concurrent

hash tables

A Parallel Compact Hash Table October 3, 2011 3 / 19

slide-7
SLIDE 7

Introduction

◮ Hash tables are fundamental data structures ◮ Compact hash tables: memory efficient hash tables ◮ Useful in i.e. Model checking, planning, BDDs, Tree tables ◮ Problem: No concurrent implementation of concurrent

hash tables

◮ Our contribution: A scalable lockless algorithm for

compact hashing

A Parallel Compact Hash Table October 3, 2011 3 / 19

slide-8
SLIDE 8

Goals

◮ Parallel compact hash table ◮ Scalable

◮ Fast: lockless ◮ Memory efficient: no pointers (otherwise we lose the

benefits from compact hashing)

◮ Focus on findOrPut

◮ Already sufficient Model checking (monotonic growing

dataset)

◮ subsumes individual find and put operations A Parallel Compact Hash Table October 3, 2011 4 / 19

slide-9
SLIDE 9

Overview

Research Motivation Background Contribution

A Parallel Compact Hash Table October 3, 2011 5 / 19

slide-10
SLIDE 10

Hashing Revisited

◮ A hash table stores a subset of a key universe U into an

table T of buckets typically |U| ≫ |T|

◮ Multiple keys can be mapped upon 1 bucket ◮ The full key is stored in T to resolve collisions ◮ Several possible collision resolution algorithms, i.e. linear

probing

A Parallel Compact Hash Table October 3, 2011 6 / 19

slide-11
SLIDE 11

Hashing Revisited - Example

keys

John Smith Lisa Smith Sam Doe Sandra Dee T ed Baker

buckets

000 001 Lisa Smith 521-8976 002 : : : 151 152 John Smith 521-1234 153 Sandra Dee 521-9655 154 T ed Baker 418-4165 155 : : : 253 254 Sam Doe 521-5030 255

Figure: Example of an open addressing hash table.

A Parallel Compact Hash Table October 3, 2011 7 / 19

slide-12
SLIDE 12

Introduction Into Compact Hash Tables

◮ If however |U| ≤ |T|, we only need a bit array! (and a

perfect hash function)

◮ What if |U| just slightly bigger than |T|? Cleary Tables:

  • 1. Maintain order in T
  • 2. Add three bits to buckets in T

A Parallel Compact Hash Table October 3, 2011 8 / 19

slide-13
SLIDE 13

Introduction Into BLP

Let K be the set of possible keys and h the hash function which computes the indexes. h : K → {0..M − 1} with the property K1, K2 ∈ K|K1 ≤ L2iff h(K1) ≤ h(K2)

◮ All keys are stored in ascending order. ◮ There can not be empty locations between a keys original

hash location and its actual storage position.

◮ All keys sharing the same initial hash location form one

continuous group.

◮ Groups can grow together forming clusters of groups. ◮ Bidirectional linear probing algorithm (probing possible in

both directions)

A Parallel Compact Hash Table October 3, 2011 9 / 19

slide-14
SLIDE 14

Introduction Into BLP - Insert Example

Inserting k into table T in 5 steps:

  • 1. Determine index: i ← h(k)
  • 2. Determine probing direction T[h(k)] > k?right : left
  • 3. Search empty bucket
  • 4. Insert K into empty bucket
  • 5. Swap bucket into correct place

A Parallel Compact Hash Table October 3, 2011 10 / 19

slide-15
SLIDE 15

Cleary Table

Cleary administration bits:

◮ Virgin Set upon a bucket if its location is the initial hash

location for some key in the tables

◮ Change Set at the beginning of a group with the same

initial hash location

◮ Occupied Set if the bucket contains a key

A Parallel Compact Hash Table October 3, 2011 11 / 19

slide-16
SLIDE 16

Cleary Table - Example

Figure: Example of a partially filled Cleary table with 4 groups.

A Parallel Compact Hash Table October 3, 2011 12 / 19

slide-17
SLIDE 17

Overview

Research Motivation Background Contribution

A Parallel Compact Hash Table October 3, 2011 13 / 19

slide-18
SLIDE 18

Requirements for Parallelizing

We need a write-exclusive locking mechanism that

◮ Scales well ◮ Is memory efficient

A Parallel Compact Hash Table October 3, 2011 14 / 19

slide-19
SLIDE 19

Locking Mechanism

Properties:

◮ 1 bit per bucket

A Parallel Compact Hash Table October 3, 2011 15 / 19

slide-20
SLIDE 20

Locking Mechanism

Properties:

◮ 1 bit per bucket ◮ CAS(a,b,c) - Compare-and-Swap (if a == b then a ← c)

A Parallel Compact Hash Table October 3, 2011 15 / 19

slide-21
SLIDE 21

Locking Mechanism

Properties:

◮ 1 bit per bucket ◮ CAS(a,b,c) - Compare-and-Swap (if a == b then a ← c)

Locking steps:

  • 1. Search for both left and right bucket of cluster

A Parallel Compact Hash Table October 3, 2011 15 / 19

slide-22
SLIDE 22

Locking Mechanism

Properties:

◮ 1 bit per bucket ◮ CAS(a,b,c) - Compare-and-Swap (if a == b then a ← c)

Locking steps:

  • 1. Search for both left and right bucket of cluster
  • 2. Lock these buckets

A Parallel Compact Hash Table October 3, 2011 15 / 19

slide-23
SLIDE 23

Locking Mechanism

Properties:

◮ 1 bit per bucket ◮ CAS(a,b,c) - Compare-and-Swap (if a == b then a ← c)

Locking steps:

  • 1. Search for both left and right bucket of cluster
  • 2. Lock these buckets
  • 3. If one of these locks fails → unlock and start over

A Parallel Compact Hash Table October 3, 2011 15 / 19

slide-24
SLIDE 24

Locking Mechanism

Properties:

◮ 1 bit per bucket ◮ CAS(a,b,c) - Compare-and-Swap (if a == b then a ← c)

Locking steps:

  • 1. Search for both left and right bucket of cluster
  • 2. Lock these buckets
  • 3. If one of these locks fails → unlock and start over
  • 4. Perform exclusive actions (read, write)

A Parallel Compact Hash Table October 3, 2011 15 / 19

slide-25
SLIDE 25

Dynamic Region Based Locking

1: left ← CL-LEFT(h) 2: right ← CL-RIGHT(h) 3: if ¬TRY-LOCK(T[left]) then 4: RESTART 5: if ¬TRY-LOCK(T[right]) then 6: UNLOCK(T[left]) 7: RESTART 8: if FIND(k) then

⊲ exclusive read

9: UNLOCK(T[left], T[right]) 10:

return FOUND

11: PUT(k)

⊲ exclusive write

12: UNLOCK(T[left], T[right])

A Parallel Compact Hash Table October 3, 2011 16 / 19

slide-26
SLIDE 26

Benchmarks - Speedup

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Speedup Cores LHT 0:1 LHT 3:1 LHT 9:1 RBL 0:1 RBL 3:1 RBL 9:1 BLP 0:1 BLP 3:1 BLP 9:1 PCT 0:1 PCT 3:1 PCT 9:1 Ideal Speedup

Figure: Speedups of BLP , RBL, LHT and PCT with r/w ratios 0:1, 3:1 and 9:1

A Parallel Compact Hash Table October 3, 2011 17 / 19

slide-27
SLIDE 27

Benchmarks - Runtime

0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100%

normalized runtime load factor LHT 0:1 LHT 3:1 LHT 9:1 RBL 0:1 RBL 3:1 RBL 9:1 BLP 0:1 BLP 3:1 BLP 9:1 PCT 0:1 PCT 3:1 PCT 9:1

Figure: 16-core runtimes of BLP , RBL, LHT and PCT with r/w ratios 0:1, 3:1 and 9:1.

A Parallel Compact Hash Table October 3, 2011 18 / 19

slide-28
SLIDE 28

Results

◮ PCT performs very good with only inserts, ◮ PCT’s performance drops when the load-factor becomes

above the 85%

◮ With a high amount of reads ¿ (9:1) BLP eventually

becomes faster than LHT

◮ Region based locking with OS-locks is very slow as can

be seen in RBL

◮ scalability of both PCL and BLP is good. ◮ r/w ratio: r/w exclusion on clusters takes a toll.

there is room for improvement if look at the higher load factors (when clusters are large)

A Parallel Compact Hash Table October 3, 2011 19 / 19

slide-29
SLIDE 29

Conclusion

◮ We have realized parallel cleary with high performance

and scalability up to load-factors of 90% Since the compression ratio of compact hash tables can be high, this is acceptable

◮ Future work: Allow for concurrent reads with cleary to

improve scalability of Cleary even more

A Parallel Compact Hash Table October 3, 2011 20 / 19