SLIDE 1
Hash Tables
SLIDE 2 LAST Unbounded arrays Amortized analysis TODAY
NEXT Implementing Hash tables
SLIDE 3
Introduction to C1 (genericity)
SLIDE 4 Implicit contract for casting
- (void*) x where x has type tp*
//@ensures \hastag(tp*, x)
- (tp*) y, where y has type void*
//@requires \hastag(tp*, y)
SLIDE 5 Only operations you allowed on p of type void*
- Cast to another type:(int*) p
- Compare to another void* value: p == q where q
is of type void*
- Compare to NULL: p == NULL
SLIDE 6
Hashing
SLIDE 7 Reflecting on arrays
- As a way to keep a collection of elements of the same
type, like a set
- As a mapping from indices to values like a dictionary
- Operations: insert, lookup
goal: make these operations efficient
{
SLIDE 8 Dictionaries (also known as maps, associative arrays)
- An array is a mapping from indices to elements where
A[i] = e.
- Dictionary: mapping from keys to entries where key can be
any kind of information
- zipcode (key) to neighborhood name (entry)
- Andrew id (key) to home address (entry)
- SSN (key) to tax id (entry)
key entry
SLIDE 9
Implementing dictionaries
unsorted (key,entry) array (key, entry) array sorted by key linked list with (key,entry) data
lookup insert O(n) O(log n) O(n)
O(1) amortized
O(n) O(1)
Can we implement dictionaries such that both lookup and insert are about O(1)?
SLIDE 10
Example: Storing zipcodes using an array with length 5
Some fun zip codes: 90210 Beverly Hills 10101 New York 20500 White House 44444 Newton Falls, OH 94043 Googleplex 15213 CMU 15217 Squirrel Hill 15122 Kennywood
key value
1 2 3 4
SLIDE 11
Example: Storing zipcodes using an array with length 5
Some fun zip codes: 90210 Beverly Hills 10101 New York 20500 White House 44444 Newton Falls, OH 94043 Googleplex 15213 CMU 15217 Squirrel Hill 15122 Kennywood
hash value key index
key value
zipcode zipcode % 5 zipcode % 5
SLIDE 12 Design choices for handling collisions
- Open addressing (e.g. linear probing)
- Separate chaining
SLIDE 13
Example: linear probing
Look for an empty slot somewhere predictable: next position, then next-next … 1 2 3 4
15217 Squirrel Hill 20500 White House 90210 Beverly Hills 10101 New York
“Squirrel Hill” “White House”
“Beverly Hills”
“New York”
SLIDE 14
Example: linear probing
How do you know something is not in the table? 1 2 3 4
15217 Squirrel Hill 20500 White House 90210 Beverly Hills 10101 New York
“Squirrel Hill” “White House”
“Beverly Hills”
“New York”
SLIDE 15
Example: separate chaining
1 2 3 4
SLIDE 16
Cost analysis of separate chaining
If we have an array of size m and a total of n entries, how much does it take to lookup an entry?
SLIDE 17
Worst possible layout
1 2 3 4 … m n O(n)
SLIDE 18
Best possible layout
1 2 3 4 … … … … … m n/m O(n/m)
SLIDE 19
Cost analysis of separate chaining
Can we arrange so that n/m is constant? use resizing as we did in unbounded arrays
SLIDE 20
Implementing dictionaries
unsorted (key,value) array (key, value) array sorted by key linked list with (key,value) data Hash tables
lookup insert O(n) O(log n) O(n)
O(1) amortized
O(n) O(1) O(n/m) Average O(1) average and amortized O(n/m) Average O(1) average and amortized