16-HashTabl BitvectorReu.ec - Suppose we want to store some de # a - - PowerPoint PPT Presentation

16 hashtabl
SMART_READER_LITE
LIVE PREVIEW

16-HashTabl BitvectorReu.ec - Suppose we want to store some de # a - - PowerPoint PPT Presentation

16-HashTabl BitvectorReu.ec - Suppose we want to store some de # a set SE Lad ] , for of S is - A bitrr representation a Boolean array B of size dtt its sit BLIJ , = { o sie d is true } : Bfi ] S or D= 20 , 5- { 3.7.93 Eg : . B. =


slide-1
SLIDE 1

16-HashTabl

slide-2
SLIDE 2

BitvectorReu.ec

  • Suppose we want to store

a set SE Lad] , for

some de #

  • A bitr¥r representation
  • f S is

a Boolean

array B

  • f

size dtt

sit

BLIJ

⇒ its

,
  • r

S

= { o sie d

: Bfi]

is true }

Eg

.

D= 20 , 5- { 3.7.93

:
  • B. = foIololohl4llotT4
  • Operations

member tx) , inserted

,

remove 4)

are all 04 ) .

. Only

practical

where

  • d. is small
^ Space inefficient if

1st Kd

  • Copy , Union, Intersection all

① (c)

slide-3
SLIDE 3

HashFuncti

  • A hash function for a set D

is a function

h :D → M

where IM1E 1DI ,

ie

a map

to a smaller set.

Eg

h:[0, MAXINT] → Lo , 123 ,

h (x)

= x mod

13

( 1441=13 , ID 1=2,147,483,647)

  • There will be

values

ix.y ED stuffy but hcxthly) .

  • Notation :

Define

HCS)

= { y

'

y

= fix) and HES }

Eg

h (3) =3 ; h (7) = 4 ;

HC13)

  • O 's

HC15) = 2 's HC20)

= 7

h({ 3,7 , 13 , 15,203)

= {0,213,7 }
  • If hlxthcy

) for ix.yes ,

we call it

a collision leg 3,15)

  • We

will want hash functions h st .

  • ran

h

= [0 , m- I] for me#

( array indices)

  • h

tends to distribute S uniformly

  • ver L0 , m -I
  • m
= IM1 will be

prime

slide-4
SLIDE 4

HashFunctiowtBitVec

  • Let h :D → Lo , m -D , B

a Boolean array of size M

  • For

a set

SED ,

set

BLIJ

= true iff

there is NED

sit . hlx)=i

er { i :BLil}

=

HCS)

Eg

:

S

= { 3,7 , 13,15 , 20}

h (x)

= x mod

13 ; m = 13

h (s)

= { 0,2 . 3,73

B

=

Fin

: { x : Bfhlx)] }

= { 0,2 , 3,7 , 13, 15 , 19,25 , 27, 31,
  • --3
  • Bfhtx))
= I

" suggests at 5

"
  • Bfh 4) 3=0

implies

x 45

.

e¥ there may

be false positives but

never false negatives

.
slide-5
SLIDE 5

BIoomFiH

. Let

H

= {hi , ha , . . . he}

be a set of distinct

hash functions

for

a set D , each with range

Lam-D

.
  • For

S E D , set BLIJ

= true if

hlx)=i for some he H ;

Bfi)

= false
  • .w.TO

test for membership

in

S :

. if BLHCXIJ = true for all he H

, return true

  • -w . return false .
. We get

a false positive only when

hlx) is a

collision for every HEH

.
  • B

is

a Bloom¥s

  • If

m is large enough relative to 1st

and

the hi are good quality

, independent hash

functions

, then there will be few false positives
slide-6
SLIDE 6

Hash Tables

  • Let h:D → M be a hash function for D with M= Lo, m-'I
  • Let A be an array of size IM1 and type D u E - 3

.

A : M → DUE-3

  • For

a set SED , we want

A [ had] = x ,

for

each XES A Lil

=
  • if

hcx) Fi for every des

.

Eg

.:S = { 2,12 , 17,213 , hlx) = x mod

13

h Is)={2,11 , 5,93 A

=
  • To check membership

in S , return

A Lhtx)]

.
  • A

is

a hashtabIef

  • But what if we have collisions ?
  • Need coHisionhang
.

We will look at

a few methods .

.
slide-7
SLIDE 7

Hashingwithseparatechaining.lt

A be

a size - M array of

linked lists

  • Set A Lil to be a list of the elements {xesihlxt.is
.
  • T
  • test for membership in S :
  • Return true iff

x

is in the list

A Lhhd]

F⇒

.

5=21,5, 7,13 , 18 ,

A

=l

next x mod is

" }

It

¥?¥

  • T
  • insert1rem.

ve x : insertlremove x from Afhlx))

.
  • If

h distributes S

almost uniformly

  • ver M , the lists will

be

small

, and time will

be essentially 04) .

  • In the worst case ,

some lists

have length IN

,

and

performance degrades to that of

linked lists

: In) .

i)

.
slide-8
SLIDE 8

Hashing with Probing

. (open Addressing)
  • > Let

A be an array of size M and type DUE-3 ,

£ a hash function h:D → Lo, m-7

. Let f be a function f. A -0N , that

has flo)=o

and is monotone increasing

leg . ixsy ⇒find > fly))

  • Define
, for it N

hi 4)

= Lhtx) tfli)) mod

m

> #

Ex. htx)

= x mod

13

,

fli ) = i

ho (3) = h (3) to

= 3

h , (3)

=

h (3) t I

= 4

hzl 3)

= h(3) t 2 = 5
  • To

resolve

collisions

, probe the sequence of

cells:

A Lino (a)3

. Ahh , CH3 , Afk Hl) ,
  • .
.
slide-9
SLIDE 9

Hashingwithprobing-lopenh-ddressingl.tn

iCx)=(htx)tfmodm_

Toehechaformembershipofuxio

Examine the sequence of locations

A- tho

3

, A[him3 , Athe

.

. . . . stop at the first location containing

x

  • r

I.

return true if

x

was found, false otherwise .

  • T
  • insertu
  • Examine the sequence of locations

A- tho4)3

, AL4H)3 , ALWYN . . .
  • Stop

at thefirst location containing

  • and

store x there

.
  • Choice of

ft) determines properties

.
slide-10
SLIDE 10

Hashingwithhinearprobing-let.fi

)

= i

F- The sequence of locations to probe is :

A LH4B , A LhHtD, A Chintz] , Afhlxtt 33 ,

. . . It

is mod m )

  • Suppose htt) = x mod 13 ,

5=12,9 ,18,363

( so hls) = { 25,9105) and A is

  • T
  • insert

5 :

. compute

HC5)=5 ;

  • see that

AL53 It

  • see that

ALGJ

= - ,

so set

AL63

= 5
  • Now: A = I
  • To check if

5 es :

. compute

HC5)=5 ;

  • see that

AL53 It , AC53 ¥5

  • seethat AL63
= 5 and return true
  • T
  • check if 31 ES :
. Compute HC31) = 5,
  • see that AL53 # 31
, AL53 #

i.

see that AL63 I 31 , AL6) I -

  • see that AL73 = -

and returnfalse

slide-11
SLIDE 11

HashingwithQuadrakcprobing.hetfcif.it

→The sequence of

locations to

probe is :

A 1h43

, A Lhlxtt 13 , A-Lhtx)t 43 , ALhlx) to] , . . . It

is mod m )

Exi Suppose htt) = x mod 13 ,

5=12,9 , 18,363

( so h b) = { 25,9105) and A is

  • To insert 35
:
  • compute h(351=9
. see that AL93

F.

see that ACID

x.

see that AL13

= - and store 35 there .
  • Now : A is

T

  • check if 35Es :
  • compute h(351=9
. see that AL93 # - , AL93# 35 . see that Ado) ¥ - , All03 # 35 . see that AL13 = 35 and return true
  • T
  • check if 22 E S :
. compute h(22) =

9.

see that AG3A 4OI

,ALD , AL53 are

not

22

  • r

a.

see that AE23 = - and return false

slide-12
SLIDE 12

Doublettashing.lt

flit = i. hash,4) ,

where hashzlx)

is

a hash function for D that

is

different from h , and

with

rain ( hash)

E Li , m]

  • The sequence of locations to probe

is :

( t

is mod m )

Afhlx)) , Afhlx) thashzlx)] , Afhlx)t 2. hash, HI ,

. . .

IE Suppose h 4) = x mod 13 , 8=12,9 , 18,36 }

( so h b) = { 25,9105) and A is

  • To insert 15 :
. compute htx) =

2.

see that A4J

F.

compute hashzlx) = 6

  • see that AL83 = , and store 15 there
  • Now : A is
  • To check if

15ES , check AL23 , then AL83 , andreturntrue

  • To check if 10 ES :
.Compute hGo) = to
  • see that A LID # 10 , A 403 #

i.

compute hashz(lot =

4.

see that AL13

= , and return false
slide-13
SLIDE 13

Removal with Open Addressing

  • Suppose we have a hash table H for a set S containing x,

and want to remove x.

  • If

It uses separate chaining ,

we just delete x.

  • If

It uses open addressing

,

we cannot ,

because ix affects the probe sequence for other elements .

. Suppose htt) = x mod 13 ,

5=12 , 5,9 , 18,363

and A was obtained

as in

  • ur

Linear Probing

example : A =

  • Suppose we now delete

18 ,

so
  • A = I

I.

Now

, searching for 5 fails , because

ALHC5)I = - I

  • One

solution is to mask cells

where

we have deleted

elements

.
slide-14
SLIDE 14

Removalwithopen-hddressing.FI

.

In the previous

example , to

remove

18 we replace

it with d. :

A =

  • Now
,

search 4 insert procedures

perform

as

if AL53

has

some

key that

we will

never

use .

  • Toremoueu
  • examine the sequence of

locations

A Lhote)), A Child

, A LhzHI ,
  • .
. . when

x

is found,

replace it with d.

  • Notice that search 4

insert work correctly

as they

are

  • Insert can

be modified to reclaim space : T

  • insert x :
  • Examine the sequence of prob locations
  • stop at the first one containing
  • r d

and store x there

.
  • NB : In implementation
, d and -

could

be special values

,
  • r A could be an

array of objects

  • r struts with "empty
" and "deleted " variablesfields .
slide-15
SLIDE 15

↳adT ac

  • The loadfa
  • f

a hash table

H

is

. '

# of keys)t # of

elements marked d) (If It uses separate

A

= -

chaining

, there

m

are

no d's )
  • Good performance requires 1

not too large

.
  • For separate chaining
: I should not be much larger

than 1

,

so average list length is about 1

. " For
  • pen addressing , want 4<0.5 ,

so that

it

is not too

hard to find a place to make an insertion

.
slide-16
SLIDE 16

somepropertieswithopenAddressing.li

nearprobing

  • \ Insertion always succeeds if 4L

I

  • Primatering

is

a serious problem .

  • Quadratic Probing
  • Avoids primary clustering
  • Exhibits secondary clustering
  • but less problematic
  • Insertion alway succeeds if At 0.5
,

but may

fail if I > 0.5 (even if there

is space)

. .Doublettash.me#
  • Requires design of a second suitable hash function
: Requires computing 2 hash functions whenever
  • probing beyond A[hotel)
is

needed

.
slide-17
SLIDE 17

Rehashing

  • Rehashing

hash table It

means constructing

a completely

new hash table for the

contents of

It .

  • We may

want to

do it if

:
  • 4

is too large ( close to 0.5 for

  • pen

addressing

,

much larger than 1 for separate chaining )

  • Performance has become

poor

( which may result from clustering

, from

long linked

lists ,

  • r from many removals)
  • Takes time

① Ln)

under the assumption that insert is Eli)

.
slide-18
SLIDE 18

HashingPropert

  • Well
  • designed hash tables are effective in practice
,

with fast

insert

, member, remove operations

  • Require a good hash function for the domain of application
  • Operations 04)
  • narera→ , under assumptions that

may not hold

in practice :

  • all keys equally likey
  • hash function distributes keys uniformly
  • A

small

  • Do not support
  • perations

based

  • n order of keys ,

such as :

. enumerate

in order

  • min,

Max, range lookups

. Union

r intersection

(

These are efficient with

AVL Trees 4 B-Trees )

.