AnEffectiveHybrid TransactionalMemorySystem - - PowerPoint PPT Presentation

an effective hybrid transactional memory system with
SMART_READER_LITE
LIVE PREVIEW

AnEffectiveHybrid TransactionalMemorySystem - - PowerPoint PPT Presentation

AnEffectiveHybrid TransactionalMemorySystem withStrongIsolationGuarantees ,MartinTrautmann,JaeWoongChung, AustenMcDonald,NathanBronson,JaredCasper,


slide-1
SLIDE 1

AnEffectiveHybrid TransactionalMemorySystem withStrongIsolationGuarantees

,MartinTrautmann,JaeWoongChung, AustenMcDonald,NathanBronson,JaredCasper, ChristosKozyrakis,KunleOlukotun

ComputerSystemsLaboratory StanfordUniversity http://tcc.stanford.edu

slide-2
SLIDE 2

1

WhyHybridTransactionalMemory?

TransactionalMemory(TM)systemsarepromising

Largeatomicblockssimplifyparallelprogramming Speedoffine3grainlockswithsimplicityofcoarse3grainlocks

TMcanbeimplementedineitherhardwareorsoftware

HardwareTM(HTM)isfastbutinflexible&costly SoftwareTM(STM)isflexiblebutslow

Signature3AcceleratedTM(SigTM)isanewhybridTM

Useshardwaresignaturestoacceleratesoftwaretransactions

Fast,flexible,&cost3effective

Implementsstrongisolationoftransactionalcode

Correct&predictableexecutionofsoftwaretransactions

slide-3
SLIDE 3

2

Outline

Introduction SigTM Performance SigTM StrongIsolation RelatedWork Conclusion

slide-4
SLIDE 4

3

  • High3level

Low3level Compiler

WhatCanWeAccelerate?

WhatdotheseSTMfunctionsdo?

slide-5
SLIDE 5

4

STMstart

Constantoverheadcostpertransaction Expensiveonlyforshorttransactions

  • !

"#

  • Calledattransactionstart→ inittransactionmetadata
slide-6
SLIDE 6

5

STMread

Buildingtheread3setisexpensive Overheadcostpertransactionvaries

Localityofreadaccesses,sizeofread3set,transactionlength

  • $

$%&' '' "

  • (
  • Calledtoreadshareddata→ addtoread3set
slide-7
SLIDE 7

6

STMwrite

)*

  • *
  • Overheadcostpertransactionvaries

Localityofwriteaccesses,sizeofwrite3set,transactionlength

Significantlylessexpensivethan (reads≥ writes) Calledtowriteshareddata→ addtowrite3set

slide-8
SLIDE 8

7

STMcommit

Expensive:scanread3set(1x);scanwrite3set(3x),locks

++ $

  • "

, '' " $ ( $%&' $

  • Calledattransactionend→ atomicallycommitchanges
slide-9
SLIDE 9

8

HowSlowCanSTMBe?

1.5x3 7xslowdownoversequential HybridTMshouldfocuson and++

slide-10
SLIDE 10

9

SigTM

SigTM simplifiesSTMbyusingsimplehardware

SW SW

  • HW(write3setsignature)

SW(locks)

  • HW(read3setsignature)

SW(version#)

slide-11
SLIDE 11

10

SigTM Hardware

SigTM addsalittleHW(signatures)toaccelerateSTM

EachHWthreadhas2HWsignatures:read3set,write3set NootherHWmodifications(e.g.,noextracachestates)

& and&) populatesignatures

  • &-
  • &.

Time Read3SetSignature /

  • .

1 2 3 4

  • 5641

! " .5642 /

  • .

" # 3 !

slide-12
SLIDE 12

11

SigTM Hardware(cont)

Signatureswatchcoherencemessages

SWenables/disables

Onhitinsignature,either:

TriggerSWaborthandler(conflictdetection) NACKremoterequest(isolationenforcement)

Signaturesmaygeneratefalseconflicts

Performancebutnotcorrectnessissue Reducewithlongersignatures&betterhashfunctions

  • (-*

Read3SetSignature /

  • .

" # 3 !

  • 5641

" !

slide-13
SLIDE 13

12

SigTMstart

Read3setsignaturestartsmonitoringcoherencemessages

Ifhit,signatureinvokes" Continuousvalidationofread3set

& ! "# $%% %&

slide-14
SLIDE 14

13

SigTMread

& doesnotneedto:

Validatereadaddress→ continuousvalidationbyHWsignature Buildsoftwareread3set→ justaddtoread3setsignature

& $ $%&' % % (

slide-15
SLIDE 15

14

SigTMwrite

&) populates write3setsignature

Usedduring&++

Write3setversioningstillinSW

&)* % % $%*

slide-16
SLIDE 16

15

SigTMcommit

Read3setsignatureeliminatesscanofread3settovalidate Write3setsignatureeliminateslocks Twowrite3setscansinsteadofthree

&++ $%% %& $ %' $%% % $%% %& $

  • ( $%&'

$%% %&

slide-17
SLIDE 17

16

HowMuchSmalleristheOverhead?

Measureddynamicinstructioncounts

R=#wordsinread3set;W=#wordsinwrite3set

Measuredsingle3threadperformancerelativetosequential

41+12W 44+16R+31W ( 8 19

  • 2.93x

1.25x )& 0.41 0.14 0.81 0.65

slide-18
SLIDE 18

17

ExperimentalSetup

Execution3drivensimulationtocompare:SigTM,STM,HTM STAMP:StanfordTransactionalAppsforMultiprocessing

4benchmarksforTMresearchwritteninC

delaunay:Delaunaymeshgeneration genome:genesequencing kmeans:K3meansclustering vacation:travelreservationsystem(similartoSPECjbb2000)

Parallelizedfromsequentialcode

Coarse3graintransactions(intuitiveparallelprogramming) Over95%oftimeisspentintransactions

STMcodeismanuallyoptimized(samecodeforSigTM)

HTMcodehasnoinstrumentationonreads/writes

slide-19
SLIDE 19

18

HowFastisSigTM?

SigTM fasterthanSTMbutslowerthanHTM Genome:SigTM 30%fasterthanSTM;within10%ofHTM Vacation:SigTM 2.8xfasterthanSTM;2xslowerthanHTM

Manynon3redundantreadbarriers→ largeperformancedifference

slide-20
SLIDE 20

19

HowMuchHardwareDoesitCost?

Decreasedsignaturesizetoincreasefalseconflicts Performancesensitivetoread3setsignaturelength

1024bitsisrecommended

Performanceinsensitivetowrite3setsignaturelength

128bitsisrecommended

slide-21
SLIDE 21

20

Outline

Introduction SigTM Performance SigTM StrongIsolation RelatedWork Conclusion

slide-22
SLIDE 22

21

ExampleProgram:Privatization

Twoacceptableoutcomes:

T1commitsfirst;T1privatizes&usesnon3incremented* T2commitsfirst;T1privatizes&usesincremented*

Workscorrectlywithlock3basedsynchronization

Race3freeprogram฀

  • 77* +8+

Thread1

  • )

*99

  • Thread2
slide-23
SLIDE 23

22

  • 77* +8+

Thread1

  • )

*99

  • Thread2

UnpredictableResultswithSTM?

AllSTMsmayleadtounexpectedresultswiththiscode

T1mayusebothold&newvalueafterprivatization

Cause:non3transactionalaccessesarenotinstrumented

Non3Txwritesdonotcause Txto abort Txcommitnot isolatedwithrespectto non3TXaccesses฀

slide-24
SLIDE 24

23

StrongIsolation

Definition:transactionsare isolatedfromnon3Txaccesses HTM → inherentstrongisolation

Non3Txcausecoherencemessages Conflictdetectionmechanismenforcesstrongisolation

STM → supplementedstrongisolation

Additionalbarriersneededinnon3Txaccesses Somecanbeoptimizedbutstillasourceofoverhead

SigTM→ inherentstrongisolation

Withoutadditionalinstrumentationoroverhead

slide-25
SLIDE 25

24

HowSigTMProvidesStrongIsolation

Non3Txwritetoread3set?

Hitsinread3setsignature → transactionaborts

77- 77. +

  • /

9-

  • :8;/
slide-26
SLIDE 26

25

Outline

Introduction SigTMPerformance SigTMStrongIsolation RelatedWork Conclusion

slide-27
SLIDE 27

26

SigTMandOtherHybridTMs

Kumar(PPoPP’06)andHyTM(ASPLOS’06)

RequiresignificantcachemodificationsforHTM Need2versionsoftransactioncode

HASTM(MICRO’06)

Requirescachemodifications(expensivefornesting) Cacheupdatesfromprefetching/speculationproblematic

RTM(ISCA’07– latertoday)

Requiressignificantcachemodifications(TMESI)

Cachehandlescommoncaseconflictdetectionand buffering

Poorperformance(slowerthansequential…)

slide-28
SLIDE 28

27

SigTMandSignature3basedHTMs

Bulk(ISCA’06)

FirstuseofsignaturesforTM RequiresadditionalHWforwriteversioning

LogTM3SE(HPCA’07)

AdditionalHWtoimplementundolog AdditionalHWtorememberrecentlyloggedlines Recommendedsmallersignatures(32–64bits)

slide-29
SLIDE 29

28

Conclusions

SigTMisahybridTMthat:

Usesminimaladditionalhardware

1Kbitsforread3setsignature;128bitsforwrite3setsignature Nomodificationtocaches

ReducestheruntimeoverheadofSWtransactions

EliminatesSWread3set,locks,andtimestamps Continuousvalidationofread3setbyHWsignatures

Leadstogoodperformance

OutperformsSTMby30%– 280% Slowdown comparedtoHTM is10%– 100%

Deliversstrongisolationforpredictablebehavior

slide-30
SLIDE 30

29

Questions?

  • <!!

!& AnewbenchmarksuitedesignedforTMresearch

http://stamp.stanford.edu