16-HashTabl BitvectorReu.ec - Suppose we want to store some de # a - - PowerPoint PPT Presentation
16-HashTabl BitvectorReu.ec - Suppose we want to store some de # a - - PowerPoint PPT Presentation
16-HashTabl BitvectorReu.ec - Suppose we want to store some de # a set SE Lad ] , for of S is - A bitrr representation a Boolean array B of size dtt its sit BLIJ , = { o sie d is true } : Bfi ] S or D= 20 , 5- { 3.7.93 Eg : . B. =
BitvectorReu.ec
- Suppose we want to store
a set SE Lad] , for
some de #
- A bitr¥r representation
- f S is
a Boolean
array B
- f
size dtt
sit
BLIJ
⇒ its
,- r
S
= { o sie d: Bfi]
is true }
Eg
.D= 20 , 5- { 3.7.93
:- B. = foIololohl4llotT4
- Operations
member tx) , inserted
,remove 4)
are all 04 ) .
. Onlypractical
where
- d. is small
1st Kd
- Copy , Union, Intersection all
① (c)
HashFuncti
- A hash function for a set D
is a function
h :D → M
where IM1E 1DI ,
ie
a map
to a smaller set.
Eg
h:[0, MAXINT] → Lo , 123 ,
h (x)
= x mod13
( 1441=13 , ID 1=2,147,483,647)
- There will be
values
ix.y ED stuffy but hcxthly) .
- Notation :
Define
HCS)
= { y'
y
= fix) and HES }Eg
h (3) =3 ; h (7) = 4 ;
HC13)
- O 's
HC15) = 2 's HC20)
= 7h({ 3,7 , 13 , 15,203)
= {0,213,7 }- If hlxthcy
) for ix.yes ,
we call it
a collision leg 3,15)
- We
will want hash functions h st .
- ran
h
= [0 , m- I] for me#( array indices)
- h
tends to distribute S uniformly
- ver L0 , m -I
- m
prime
HashFunctiowtBitVec
- Let h :D → Lo , m -D , B
a Boolean array of size M
- For
a set
SED ,
set
BLIJ
= true iffthere is NED
sit . hlx)=i
er { i :BLil}
=HCS)
Eg
:S
= { 3,7 , 13,15 , 20}h (x)
= x mod13 ; m = 13
h (s)
= { 0,2 . 3,73B
=Fin
: { x : Bfhlx)] }
= { 0,2 , 3,7 , 13, 15 , 19,25 , 27, 31,- --3
- Bfhtx))
" suggests at 5
"- Bfh 4) 3=0
implies
x 45
.e¥ there may
be false positives but
never false negatives
.BIoomFiH
. LetH
= {hi , ha , . . . he}be a set of distinct
hash functions
for
a set D , each with range
Lam-D
.- For
S E D , set BLIJ
= true ifhlx)=i for some he H ;
Bfi)
= false- .w.TO
test for membership
in
S :
. if BLHCXIJ = true for all he H, return true
- -w . return false .
a false positive only when
hlx) is a
collision for every HEH
.- B
is
a Bloom¥s
- If
m is large enough relative to 1st
and
the hi are good quality
, independent hashfunctions
, then there will be few false positivesHash Tables
- Let h:D → M be a hash function for D with M= Lo, m-'I
- Let A be an array of size IM1 and type D u E - 3
⇒
.A : M → DUE-3
- For
a set SED , we want
A [ had] = x ,
for
each XES A Lil
=- if
hcx) Fi for every des
.Eg
.:S = { 2,12 , 17,213 , hlx) = x mod13
h Is)={2,11 , 5,93 A
=- To check membership
in S , return
A Lhtx)]
.- A
is
a hashtabIef
- But what if we have collisions ?
- Need coHisionhang
We will look at
a few methods .
.Hashingwithseparatechaining.lt
A be
a size - M array of
linked lists
- Set A Lil to be a list of the elements {xesihlxt.is
- T
- test for membership in S :
- Return true iff
x
is in the list
A Lhhd]
F⇒
.5=21,5, 7,13 , 18 ,
A
=lnext x mod is
" }
It
¥?¥
- T
- insert1rem.
ve x : insertlremove x from Afhlx))
.- If
h distributes S
almost uniformly
- ver M , the lists will
be
small
, and time willbe essentially 04) .
- In the worst case ,
some lists
have length IN
,
and
performance degrades to that of
linked lists
: In) .i)
.Hashing with Probing
. (open Addressing)- > Let
A be an array of size M and type DUE-3 ,
£ a hash function h:D → Lo, m-7
. Let f be a function f. A -0N , thathas flo)=o
and is monotone increasing
leg . ixsy ⇒find > fly))
- Define
hi 4)
= Lhtx) tfli)) modm
> #Ex. htx)
= x mod13
,
fli ) = i
ho (3) = h (3) to
= 3h , (3)
=h (3) t I
= 4hzl 3)
= h(3) t 2 = 5- To
resolve
collisions
, probe the sequence ofcells:
A Lino (a)3
. Ahh , CH3 , Afk Hl) ,- .
Hashingwithprobing-lopenh-ddressingl.tn
iCx)=(htx)tfmodm_
Toehechaformembershipofuxio
Examine the sequence of locations
A- tho
3
, A[him3 , Athe.
. . . . stop at the first location containingx
- r
I.
return true if
x
was found, false otherwise .
- T
- insertu
- Examine the sequence of locations
A- tho4)3
, AL4H)3 , ALWYN . . .- Stop
at thefirst location containing
- and
store x there
.- Choice of
ft) determines properties
.Hashingwithhinearprobing-let.fi
)
= iF- The sequence of locations to probe is :
A LH4B , A LhHtD, A Chintz] , Afhlxtt 33 ,
. . . Itis mod m )
F¥
- Suppose htt) = x mod 13 ,
5=12,9 ,18,363
( so hls) = { 25,9105) and A is
- T
- insert
5 :
. computeHC5)=5 ;
- see that
AL53 It
- see that
ALGJ
= - ,so set
AL63
= 5- Now: A = I
- To check if
5 es :
. computeHC5)=5 ;
- see that
AL53 It , AC53 ¥5
- seethat AL63
- T
- check if 31 ES :
- see that AL53 # 31
i.
see that AL63 I 31 , AL6) I -
- see that AL73 = -
and returnfalse
HashingwithQuadrakcprobing.hetfcif.it
→The sequence oflocations to
probe is :
A 1h43
, A Lhlxtt 13 , A-Lhtx)t 43 , ALhlx) to] , . . . Itis mod m )
Exi Suppose htt) = x mod 13 ,
5=12,9 , 18,363
( so h b) = { 25,9105) and A is
- To insert 35
- compute h(351=9
F.
see that ACID
x.
see that AL13
= - and store 35 there .- Now : A is
T
- check if 35Es :
- compute h(351=9
- T
- check if 22 E S :
9.
see that AG3A 4OI
,ALD , AL53 arenot
22
- r
a.
see that AE23 = - and return false
Doublettashing.lt
flit = i. hash,4) ,
where hashzlx)
is
a hash function for D that
is
different from h , and
with
rain ( hash)
E Li , m]
- The sequence of locations to probe
is :
( t
is mod m )Afhlx)) , Afhlx) thashzlx)] , Afhlx)t 2. hash, HI ,
. . .IE Suppose h 4) = x mod 13 , 8=12,9 , 18,36 }
( so h b) = { 25,9105) and A is
- To insert 15 :
2.
see that A4J
F.
compute hashzlx) = 6
- see that AL83 = , and store 15 there
- Now : A is
- To check if
15ES , check AL23 , then AL83 , andreturntrue
- To check if 10 ES :
- see that A LID # 10 , A 403 #
i.
compute hashz(lot =
4.
see that AL13
= , and return falseRemoval with Open Addressing
- Suppose we have a hash table H for a set S containing x,
and want to remove x.
- If
It uses separate chaining ,
we just delete x.
- If
It uses open addressing
,we cannot ,
because ix affects the probe sequence for other elements .
E¥
. Suppose htt) = x mod 13 ,5=12 , 5,9 , 18,363
and A was obtained
as in
- ur
Linear Probing
example : A =
- Suppose we now delete
18 ,
so- A = I
I.
Now
, searching for 5 fails , becauseALHC5)I = - I
- One
solution is to mask cells
where
we have deleted
elements
.Removalwithopen-hddressing.FI
.In the previous
example , to
remove
18 we replace
it with d. :
→
A =
- Now
search 4 insert procedures
perform
as
if AL53
has
some
key that
we will
never
use .
- Toremoueu
- examine the sequence of
locations
A Lhote)), A Child
, A LhzHI ,- .
x
is found,
replace it with d.
- Notice that search 4
insert work correctly
as they
are
- Insert can
be modified to reclaim space : T
- insert x :
- Examine the sequence of prob locations
- stop at the first one containing
- r d
and store x there
.- NB : In implementation
could
be special values
,- r A could be an
array of objects
- r struts with "empty
↳adT ac
- The loadfa
- f
a hash table
H
is
. '# of keys)t # of
elements marked d) (If It uses separate
A
= -chaining
, therem
are
no d's )- Good performance requires 1
not too large
.- For separate chaining
than 1
,so average list length is about 1
. " For- pen addressing , want 4<0.5 ,
so that
it
is not too
hard to find a place to make an insertion
.somepropertieswithopenAddressing.li
nearprobing
- \ Insertion always succeeds if 4L
I
- Primatering
is
a serious problem .
- Quadratic Probing
- Avoids primary clustering
- Exhibits secondary clustering
- but less problematic
- Insertion alway succeeds if At 0.5
but may
fail if I > 0.5 (even if there
is space)
. .Doublettash.me#- Requires design of a second suitable hash function
- probing beyond A[hotel)
needed
.Rehashing
- Rehashing
hash table It
means constructing
a completely
new hash table for the
contents of
It .
- We may
want to
do it if
:- 4
is too large ( close to 0.5 for
- pen
addressing
,much larger than 1 for separate chaining )
- Performance has become
poor
( which may result from clustering
, from
long linked
lists ,
- r from many removals)
- Takes time
① Ln)
under the assumption that insert is Eli)
.HashingPropert
- Well
- designed hash tables are effective in practice
with fast
insert
, member, remove operations
- Require a good hash function for the domain of application
- Operations 04)
- narera→ , under assumptions that
may not hold
in practice :
- all keys equally likey
- hash function distributes keys uniformly
- A
small
- Do not support
- perations
based
- n order of keys ,
such as :
. enumeratein order
- min,
Max, range lookups
. Unionr intersection
(
These are efficient with
AVL Trees 4 B-Trees )
.