Revisiting the Issue of Performance Enhancement of Discrete Event - - PowerPoint PPT Presentation

revisiting the issue of performance enhancement
SMART_READER_LITE
LIVE PREVIEW

Revisiting the Issue of Performance Enhancement of Discrete Event - - PowerPoint PPT Presentation

Revisiting the Issue of Performance Enhancement of Discrete Event Simulation Software 1 Alex Bahouth, Steven Crites, Norman Matloff and Todd Williamson Department of Computer Science University of California at Davis Davis, CA 95616 USA


slide-1
SLIDE 1

Revisiting the Issue of Performance Enhancement

  • f Discrete Event Simulation Software 1

Alex Bahouth, Steven Crites, Norman Matloff and Todd Williamson Department of Computer Science University of California at Davis Davis, CA 95616 USA matloff@cs.ucdavis.edu

1We wish to thank Victor Castillo and the Lawrence Livermore National

Laboratory for supporting this research.

slide-2
SLIDE 2

This presentation is produced using C. Campani’s Beamer L

AT

EX class. See http://heather.cs.ucdavis.edu/~matloff/beamer.html for a quick tutorial. Disclaimer: Our slides here won’t show off what Beamer can do.

  • Sorry. :-)
slide-3
SLIDE 3

Issues Addressed in This Paper

Interpreted languages (Java, Python) now popular for DES

slide-4
SLIDE 4

Issues Addressed in This Paper

Interpreted languages (Java, Python) now popular for DES Interpreted languages are slow.

slide-5
SLIDE 5

Issues Addressed in This Paper

Interpreted languages (Java, Python) now popular for DES Interpreted languages are slow. DES literature mainly algorithm-centric.

slide-6
SLIDE 6

Issues Addressed in This Paper

Interpreted languages (Java, Python) now popular for DES Interpreted languages are slow. DES literature mainly algorithm-centric. What can be done specifically for interpreted languages?

slide-7
SLIDE 7

Issues Addressed in This Paper

Interpreted languages (Java, Python) now popular for DES Interpreted languages are slow. DES literature mainly algorithm-centric. What can be done specifically for interpreted languages? What can be done for systems considerations, e.g. VM?

slide-8
SLIDE 8

Case Study: SimPy

Our investigation took the form of a case study: enhancing the peformance of the SimPy DES language.

slide-9
SLIDE 9

Case Study: SimPy

Our investigation took the form of a case study: enhancing the peformance of the SimPy DES language. About SimPy:

slide-10
SLIDE 10

Case Study: SimPy

Our investigation took the form of a case study: enhancing the peformance of the SimPy DES language. About SimPy: Written by Klaus Muller and Tony Vignaux.

slide-11
SLIDE 11

Case Study: SimPy

Our investigation took the form of a case study: enhancing the peformance of the SimPy DES language. About SimPy: Written by Klaus Muller and Tony Vignaux. I have developed an online DES course based on SimPy, available at heather.cs.ucdavis.edu/~matloff/simcourse.html.

slide-12
SLIDE 12

Case Study: SimPy

Our investigation took the form of a case study: enhancing the peformance of the SimPy DES language. About SimPy: Written by Klaus Muller and Tony Vignaux. I have developed an online DES course based on SimPy, available at heather.cs.ucdavis.edu/~matloff/simcourse.html. SimPy uses Python:

slide-13
SLIDE 13

Case Study: SimPy

Our investigation took the form of a case study: enhancing the peformance of the SimPy DES language. About SimPy: Written by Klaus Muller and Tony Vignaux. I have developed an online DES course based on SimPy, available at heather.cs.ucdavis.edu/~matloff/simcourse.html. SimPy uses Python:

Lots of high-level Python constructs make programming much easier.

slide-14
SLIDE 14

Case Study: SimPy

Our investigation took the form of a case study: enhancing the peformance of the SimPy DES language. About SimPy: Written by Klaus Muller and Tony Vignaux. I have developed an online DES course based on SimPy, available at heather.cs.ucdavis.edu/~matloff/simcourse.html. SimPy uses Python:

Lots of high-level Python constructs make programming much easier. Python generator construct used by SimPy to set up coroutines, i.e. non-preemptive threads.

slide-15
SLIDE 15

Sample SimPy Code

slide-16
SLIDE 16

Sample SimPy Code

Machine repair, several machines.

slide-17
SLIDE 17

Sample SimPy Code

Machine repair, several machines. Have class MachineClass, with member variables such as UpTime, etc.

slide-18
SLIDE 18

Sample SimPy Code

Machine repair, several machines. Have class MachineClass, with member variables such as UpTime, etc. Each class has a member function Run() which simulates one machine.

slide-19
SLIDE 19

Sample Run() Function

def Run(self): while 1: self.StartUpTime = SimPy.Simulation.now() # hold for up time UpTime = G.Rnd.expovariate(MachineClass.UpRate) yield SimPy.Simulation.hold,self,UpTime # update up time total MachineClass.TotalUpTime += SimPy.Simulation.now() - self.StartUpTime RepairTime = G.Rnd.expovariate(MachineClass.RepairRate) # hold for repair time yield SimPy.Simulation.hold,self,RepairTime

slide-20
SLIDE 20

Sample Run() Function

def Run(self): while 1: self.StartUpTime = SimPy.Simulation.now() # hold for up time UpTime = G.Rnd.expovariate(MachineClass.UpRate) yield SimPy.Simulation.hold,self,UpTime # update up time total MachineClass.TotalUpTime += SimPy.Simulation.now() - self.StartUpTime RepairTime = G.Rnd.expovariate(MachineClass.RepairRate) # hold for repair time yield SimPy.Simulation.hold,self,RepairTime The yield actually does yield the processor.

slide-21
SLIDE 21

Sample Run() Function

def Run(self): while 1: self.StartUpTime = SimPy.Simulation.now() # hold for up time UpTime = G.Rnd.expovariate(MachineClass.UpRate) yield SimPy.Simulation.hold,self,UpTime # update up time total MachineClass.TotalUpTime += SimPy.Simulation.now() - self.StartUpTime RepairTime = G.Rnd.expovariate(MachineClass.RepairRate) # hold for repair time yield SimPy.Simulation.hold,self,RepairTime The yield actually does yield the processor. But yield is a coroutine release—next time this function runs, it resumes after the yield.

slide-22
SLIDE 22

SimPy Data Structures

Assume for simplicity no tied event times.

slide-23
SLIDE 23

SimPy Data Structures

Assume for simplicity no tied event times. The Python list timestamps stores all event times, in ascending order. e.g. to determine the earliest scheduled event.

slide-24
SLIDE 24

SimPy Data Structures

Assume for simplicity no tied event times. The Python list timestamps stores all event times, in ascending order. e.g. to determine the earliest scheduled event. A Python list is not an array! One may insert and delete elements, with the corresponding overhead of shifting data.

slide-25
SLIDE 25

SimPy Data Structures

Assume for simplicity no tied event times. The Python list timestamps stores all event times, in ascending order. e.g. to determine the earliest scheduled event. A Python list is not an array! One may insert and delete elements, with the corresponding overhead of shifting data. The actual events are in a Python dictionary (associative array) named events. Python dictionaries are implemented as hash tables, reasonably fast.

slide-26
SLIDE 26

SimPy Queue Operations

slide-27
SLIDE 27

SimPy Queue Operations

When a new event is created at time t, then these operations

  • ccur:

(i) add t to list timestamps (ii) add event to dictionary events

slide-28
SLIDE 28

SimPy Queue Operations

When a new event is created at time t, then these operations

  • ccur:

(i) add t to list timestamps (ii) add event to dictionary events Step (i) makes use of Python’s bisect() function, which performs bisection sort.

slide-29
SLIDE 29

SimPy Queue Operations

When a new event is created at time t, then these operations

  • ccur:

(i) add t to list timestamps (ii) add event to dictionary events Step (i) makes use of Python’s bisect() function, which performs bisection sort. That would appear to be O(log n) time, for an n-item event list. Due to SimPy’s use of Python’s list structure, it is actually O(n), due to right-shifting of the data.

slide-30
SLIDE 30

SimPy Dequeue Operations

slide-31
SLIDE 31

SimPy Dequeue Operations

When the next event is executed, these operations occur: (iii) remove head of list timestamps, time t (iv) reactivate (invoke Python iterator for) Run() function for event of time t in dictionary events

slide-32
SLIDE 32

SimPy Dequeue Operations

When the next event is executed, these operations occur: (iii) remove head of list timestamps, time t (iv) reactivate (invoke Python iterator for) Run() function for event of time t in dictionary events Again, what would appear to be an O(1) event is actually O(n).

slide-33
SLIDE 33

Summary of Sources of SimPy Slowness

slide-34
SLIDE 34

Summary of Sources of SimPy Slowness

Dictionary (smaller problem).

slide-35
SLIDE 35

Summary of Sources of SimPy Slowness

Dictionary (smaller problem). O(n) insert operation instead of O(log n) (big problem).

slide-36
SLIDE 36

Summary of Sources of SimPy Slowness

Dictionary (smaller problem). O(n) insert operation instead of O(log n) (big problem). O(n) dequeue operation instead of O(1) (big problem).

slide-37
SLIDE 37

Summary of Sources of SimPy Slowness

Dictionary (smaller problem). O(n) insert operation instead of O(log n) (big problem). O(n) dequeue operation instead of O(1) (big problem). Possible VM issues.

slide-38
SLIDE 38

Our Solutions

slide-39
SLIDE 39

Our Solutions

Remove dictionary entirely.

slide-40
SLIDE 40

Our Solutions

Remove dictionary entirely. Rewrite core event-list operations in C for speed.

slide-41
SLIDE 41

Our Solutions

Remove dictionary entirely. Rewrite core event-list operations in C for speed. SWIG forms the “glue.”

slide-42
SLIDE 42

Our Solutions

Remove dictionary entirely. Rewrite core event-list operations in C for speed. SWIG forms the “glue.” Rethink event-list algorithms.

slide-43
SLIDE 43

Removal of Events Dictionary

slide-44
SLIDE 44

Removal of Events Dictionary

Incorporate into the timestamps list, so list elements are now

  • f the form (time, event) instead of (time).
slide-45
SLIDE 45

Removal of Events Dictionary

Incorporate into the timestamps list, so list elements are now

  • f the form (time, event) instead of (time).

The bisect() operation still works!

slide-46
SLIDE 46

Removal of Events Dictionary

Incorporate into the timestamps list, so list elements are now

  • f the form (time, event) instead of (time).

The bisect() operation still works! Needed to overload Python’s < operator.

slide-47
SLIDE 47

Rewriting Event List Ops in C for Speed

slide-48
SLIDE 48

Rewriting Event List Ops in C for Speed

“Best of both worlds”—core runs in C, but apps programmer still writes in high-level Python.

slide-49
SLIDE 49

Rewriting Event List Ops in C for Speed

“Best of both worlds”—core runs in C, but apps programmer still writes in high-level Python. Used SWIG Python/C“glue” tool. (Available for Java etc. too.)

slide-50
SLIDE 50

Rewriting Event List Ops in C for Speed

“Best of both worlds”—core runs in C, but apps programmer still writes in high-level Python. Used SWIG Python/C“glue” tool. (Available for Java etc. too.) SWIG very easy to learn, use.

slide-51
SLIDE 51

Rewriting Event List Ops in C for Speed

“Best of both worlds”—core runs in C, but apps programmer still writes in high-level Python. Used SWIG Python/C“glue” tool. (Available for Java etc. too.) SWIG very easy to learn, use. We did have to be careful regarding reference counts.

slide-52
SLIDE 52

Rethinking Event List Algorithms

slide-53
SLIDE 53

Rethinking Event List Algorithms

Lots of work in the past.

slide-54
SLIDE 54

Rethinking Event List Algorithms

Lots of work in the past. However, most algorithm-centric.

slide-55
SLIDE 55

Rethinking Event List Algorithms

Lots of work in the past. However, most algorithm-centric. Typically “simulations of simulation,” not timing of actual programs.

slide-56
SLIDE 56

Rethinking Event List Algorithms

Lots of work in the past. However, most algorithm-centric. Typically “simulations of simulation,” not timing of actual programs. No consideration of systems issues, e.g. VM.

slide-57
SLIDE 57

Empirical Evaluation

Tested many different modifications of SimPy

slide-58
SLIDE 58

Empirical Evaluation

Tested many different modifications of SimPy

  • riginal SimPy (SimPy)
slide-59
SLIDE 59

Empirical Evaluation

Tested many different modifications of SimPy

  • riginal SimPy (SimPy)

SimPy with dictionary removed, but still all-Python implementation (SimPyND)

slide-60
SLIDE 60

Empirical Evaluation

Tested many different modifications of SimPy

  • riginal SimPy (SimPy)

SimPy with dictionary removed, but still all-Python implementation (SimPyND) SimPy with original event structures retained (though no dictionary) but operations implemented in C (PQArr)

slide-61
SLIDE 61

Empirical Evaluation

Tested many different modifications of SimPy

  • riginal SimPy (SimPy)

SimPy with dictionary removed, but still all-Python implementation (SimPyND) SimPy with original event structures retained (though no dictionary) but operations implemented in C (PQArr) SimPy modified to use C-language calendar queue (CQ)

slide-62
SLIDE 62

Empirical Evaluation

Tested many different modifications of SimPy

  • riginal SimPy (SimPy)

SimPy with dictionary removed, but still all-Python implementation (SimPyND) SimPy with original event structures retained (though no dictionary) but operations implemented in C (PQArr) SimPy modified to use C-language calendar queue (CQ) SimPy modified to use C-language splay tree (Splay)

slide-63
SLIDE 63

Empirical Evaluation

Tested many different modifications of SimPy

  • riginal SimPy (SimPy)

SimPy with dictionary removed, but still all-Python implementation (SimPyND) SimPy with original event structures retained (though no dictionary) but operations implemented in C (PQArr) SimPy modified to use C-language calendar queue (CQ) SimPy modified to use C-language splay tree (Splay) Many others were tried but found to be noncompetitive.

slide-64
SLIDE 64

Empirical Evaluation

Tested many different modifications of SimPy

  • riginal SimPy (SimPy)

SimPy with dictionary removed, but still all-Python implementation (SimPyND) SimPy with original event structures retained (though no dictionary) but operations implemented in C (PQArr) SimPy modified to use C-language calendar queue (CQ) SimPy modified to use C-language splay tree (Splay) Many others were tried but found to be noncompetitive. Testbeds: Call center application. Indexed by arrival rates. Hold Model. Indexed by coeff. of var. of service times.

slide-65
SLIDE 65

Results

Summary, from fastest to slowest:

slide-66
SLIDE 66

Results

Summary, from fastest to slowest: CQ ≈

slide-67
SLIDE 67

Results

Summary, from fastest to slowest: CQ ≈ PQArr >

slide-68
SLIDE 68

Results

Summary, from fastest to slowest: CQ ≈ PQArr > SplayTree >

slide-69
SLIDE 69

Results

Summary, from fastest to slowest: CQ ≈ PQArr > SplayTree > SimPyND >

slide-70
SLIDE 70

Results

Summary, from fastest to slowest: CQ ≈ PQArr > SplayTree > SimPyND > SimPy

slide-71
SLIDE 71

Call Center Times Per Op, Lower Traffic

slide-72
SLIDE 72

Call Center Times Per Op, Higher Traffic

slide-73
SLIDE 73

Hold Model Times Per Op, Smaller COV

100 200 300 400 500 600 700 800 900 3 4 5 6 7 8 9 Length of event list Time per operation(microseconds)

CQ SimPy Splay

slide-74
SLIDE 74

Hold Model Times Per Op, Larger COV

100 200 300 400 500 600 700 800 900 2 3 4 5 6 7 8 9 Length of event list Time per operation(microseconds)

CQ SimPy Splay

slide-75
SLIDE 75

Scalability Issues

slide-76
SLIDE 76

Scalability Issues

Even though CQ and PQArr were about equal in performance, PQArr appears not to scale well to larger event sets:

slide-77
SLIDE 77

Scalability Issues

Even though CQ and PQArr were about equal in performance, PQArr appears not to scale well to larger event sets: struct user time

  • sys. time

event op. time PQArr 79.47 4.50 57.87

slide-78
SLIDE 78

Scalability Issues

Even though CQ and PQArr were about equal in performance, PQArr appears not to scale well to larger event sets: struct user time

  • sys. time

event op. time PQArr 79.47 4.50 57.87 CQ 33.24 3.95 12.69

slide-79
SLIDE 79

Number of Page Faults, Call Center (lower traffic)

150 300 450 600 750 900 1200 1500 10207 15088 19969 24850 29731 34612 39493 44374 Length of event list

CQ SimPy Splay PQArr SimPyND

slide-80
SLIDE 80

Number of Page Faults, Hold Model (medium COV)

100 200 300 400 500 600 700 800 900 1168.750 1753.125 2337.500 2921.875 3506.250 4090.625 4675.000 Length of event list

CQ SimPy Splay

slide-81
SLIDE 81

Discussion of VM Issues

slide-82
SLIDE 82

Discussion of VM Issues

CQ paging performance poor in our experiments, run on 32-bit PCs running Linux kernel 2.6.20.

slide-83
SLIDE 83

Discussion of VM Issues

CQ paging performance poor in our experiments, run on 32-bit PCs running Linux kernel 2.6.20. Preliminary experiments on a 64-bit PC, same kernel, suggest greater variability.

slide-84
SLIDE 84

Discussion of VM Issues

CQ paging performance poor in our experiments, run on 32-bit PCs running Linux kernel 2.6.20. Preliminary experiments on a 64-bit PC, same kernel, suggest greater variability. ∴ CQ may do poorly on some systems.

slide-85
SLIDE 85

Conclusions and Discussion

slide-86
SLIDE 86

Conclusions and Discussion

Hybrid interpreted/C approach “best of both worlds”—transparent to apps programmer but with better performance

slide-87
SLIDE 87

Conclusions and Discussion

Hybrid interpreted/C approach “best of both worlds”—transparent to apps programmer but with better performance Attention to non-algorithmic issues, e.g. paging, may be worthwhile.

slide-88
SLIDE 88

Conclusions and Discussion

Hybrid interpreted/C approach “best of both worlds”—transparent to apps programmer but with better performance Attention to non-algorithmic issues, e.g. paging, may be worthwhile. What about JIT?

slide-89
SLIDE 89

Conclusions and Discussion

Hybrid interpreted/C approach “best of both worlds”—transparent to apps programmer but with better performance Attention to non-algorithmic issues, e.g. paging, may be worthwhile. What about JIT? Tried Pyscho but with disappointing results.