Python and Data Structures (continued) Tyler Moore CSE 3353, SMU, - - PDF document

▶

Oct 12, 2023 310 likes •407 views

Notes Python and Data Structures (continued) Tyler Moore CSE 3353, SMU, Dallas, TX February 5, 2013 These slides have been adapted from the slides written by Prof. Steven Skiena at SUNY Stony Brook, author of Algorithm Design Manual. For more

SLIDE 1

Python and Data Structures (continued)

Tyler Moore

CSE 3353, SMU, Dallas, TX

February 5, 2013

These slides have been adapted from the slides written by Prof. Steven Skiena at SUNY Stony Brook, author of Algorithm Design Manual. For more information see http://www.cs.sunysb.edu/~skiena/

CSE Seminars

Highly recommended if you’re considering graduate school Extra credit available (1 point on homework assignments for every CSE seminar you go to) You may also refer to the Python code implementing selection sort at http://lyle.smu.edu/~tylerm/courses/cse8058/

2 / 27

Python Algorithm Development Process

1 Think hard about the problem you’re trying to solve. Specify the

expected inputs for which you’d like to provide a solution, and the expected outputs.

2 Describe a method to solve the problem using English and/or

pseudo-code

3 Start coding 1

Development/Debugging phase

Testing phase (for correctness)

Evaluation phase (performance)

Let’s use the insertion sort as an example of the development process in Python

3 / 27

Debugging in Python

1 Main strategy: run code in the interpreter to get instant feedback on

errors

2 Backup: Generous use of print statements 3 Once code is running in functions: pdb.pm() (Python debugger

post-mortem)

4 / 27

Notes Notes Notes Notes

SLIDE 2

Main strategy: run code in the interpreter

>>> s = [2,7,4,5,9] >>> >>> for i in range(s): ... minidx = i ... for j in range(i,len(s)): ... if s[j]<s[minidx]: ... minidx=i ... s[i],s[minidx]=s[minidx],s[i] ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: range() integer end argument expected, got list. >>> s [2, 7, 4, 5, 9] >>> range(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: range() integer end argument expected, got list. >>> len(s) 5 >>> range(len(s)) [0, 1, 2, 3, 4]

5 / 27

Second strategy: print variables out during execution

>>> for i in range(len(s)): ... minidx = i ... for j in range(i,len(s)): ... print ’list: %s, i: %i, j: %i, minidx: %i’%(s,i,j,minidx) ... if s[j]<s[minidx]: ... print "reassigning minidx %i < %i" %(s[j],s[minidx]) ... minidx=j ... s[i],s[minidx]=s[minidx],s[i] ... list: [2, 7, 4, 5, 9], i: 0, j: 0, minidx: 0 list: [2, 7, 4, 5, 9], i: 0, j: 1, minidx: 0 list: [2, 7, 4, 5, 9], i: 0, j: 2, minidx: 0 list: [2, 7, 4, 5, 9], i: 0, j: 3, minidx: 0 list: [2, 7, 4, 5, 9], i: 0, j: 4, minidx: 0 list: [2, 7, 4, 5, 9], i: 1, j: 1, minidx: 1 list: [2, 7, 4, 5, 9], i: 1, j: 2, minidx: 1 reassigning minidx 4 < 7 list: [2, 4, 7, 5, 9], i: 1, j: 3, minidx: 2 reassigning minidx 5 < 7 list: [2, 5, 7, 4, 9], i: 1, j: 4, minidx: 3 list: [2, 5, 7, 4, 9], i: 2, j: 2, minidx: 2 list: [2, 5, 7, 4, 9], i: 2, j: 3, minidx: 2 reassigning minidx 4 < 7 list: [2, 5, 4, 7, 9], i: 2, j: 4, minidx: 3 list: [2, 5, 4, 7, 9], i: 3, j: 3, minidx: 3 list: [2, 5, 4, 7, 9], i: 3, j: 4, minidx: 3 list: [2, 5, 4, 7, 9], i: 4, j: 4, minidx: 4

6 / 27

Second strategy: print variables out during execution

>>> for i in range(1,len(s)): ... minidx = i ... for j in range(i+1,len(s)): ... print ’list: %s, i: %i, j: %i, minidx: %i’%(s,i,j,minidx) ... if s[j]<s[minidx]: ... print "reassigning minidx %i < %i" %(s[j],s[minidx]) ... minidx=j ... s[i],s[minidx]=s[minidx],s[i] ... list: [2, 7, 4, 5, 9], i: 1, j: 2, minidx: 1 reassigning minidx 4 < 7 list: [2, 7, 4, 5, 9], i: 1, j: 3, minidx: 2 list: [2, 7, 4, 5, 9], i: 1, j: 4, minidx: 2 list: [2, 4, 7, 5, 9], i: 2, j: 3, minidx: 2 reassigning minidx 5 < 7 list: [2, 4, 7, 5, 9], i: 2, j: 4, minidx: 3 list: [2, 4, 5, 7, 9], i: 3, j: 4, minidx: 3

7 / 27

Third strategy: use Python debugger

Once you’ve gotten rid of the obvious bugs, move the code to a function. But what happens if you start getting run-time errors on different inputs? You can copy code directly into the interpreter Or you can run pdb.pm() to access variables in the environment at the time of the error

8 / 27

Notes Notes Notes Notes

SLIDE 3

After debugging comes testing

While you might view them as synonyms, testing is more systematic checking that algorithms work for a range of inputs, not just the ones that cause obvious bugs Use Python assert command to verify expected behavior

9 / 27

assert in action

>>> s [2, 5, 4, 7, 9] >>> t = list(s) >>> t.sort() >>> >>> assert t == s Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError >>> t [2, 4, 5, 7, 9] >>> s [2, 5, 4, 7, 9]

10 / 27

Using random to generate inputs

>>> import random, timeit >>> l10=range(10) >>> l10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> random.shuffle(l10) >>> l10 [4, 2, 0, 3, 8, 1, 9, 7, 6, 5] >>> unsortl10 = list(l10) >>> unsortl10 [4, 2, 0, 3, 8, 1, 9, 7, 6, 5] >>> l10.sort() >>> l10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> unsortl10 [4, 2, 0, 3, 8, 1, 9, 7, 6, 5] >>> assert selection_sort(unsortl10) == l10

11 / 27

Using assert on many inputs

#try 10 different shufflings of each list for i in range(10): #try all lists between 1 and 500 elements print ’trying %i time’%(i) for j in range(500): l = range(j) random.shuffle(l) #reorder the list ul = list(l) #make a copy of the unordered list l.sort() #do a known correct sort assert selection_sort(ul) == l #compare sorts

12 / 27

Notes Notes Notes Notes

SLIDE 4

Don’t forget to look for counterexamples

Using assert works when you have a known correct solution to compare against This frequently occurs when you have a known working algorithm, but you are developing a more efficient one While testing lots of random inputs is a good strategy, don’t forget to examine edge cases and potential counterexamples too

13 / 27

Empirically evaluating performance

Once you are confident that your algorithm is correct, you can evaluate its performance empirically Python’s timeit package repeatedly runs code and reports average execution time timeit arguments

code to be executed in string form

any setup code that needs to be run before executing the code (note: setup code is only run once)

parameter ‘number’, which indicates the number of times to run the code (default is 1000000)

14 / 27

Timeit in action: timing Python’s sort function and our selection sort

#store function in file called sortfun.py import random def sortfun(size): l = range(1000) random.shuffle(l) l.sort() >>> timeit.timeit("sortfun(1000)","from sortfun import sortfun",number=100) 0.0516510009765625 >>> #here is the wrong way to test the built-in sort function ... timeit.timeit("l.sort()","import random; l = range(1000); random.shuffle(l)" ,number=100) 0.0010929107666015625 >>> #let’s compare it to our selection sort >>> timeit.timeit("selection_sort(l)","from selection_sort import selection_sort; import random; l = range(1000); random.shuffle(l)",number=100) 3.0629560947418213

15 / 27

Homework 1

Due Feb 14 at 9:30am You are encouraged to work in pairs Please start on the Python coding early!

16 / 27

Notes Notes Notes Notes

SLIDE 5

Homework 1: earliestJobFirst pseudo-code

EarliestJobFirst(I) Accept the earliest starting job j from I which does not

verlap any previously accepted job, and repeat until no

more such jobs remain

17 / 27

Homework 1: earliestJobFirst

1 def

e a r l i e s t J o b F i r s t ( movies ) :

”””

Problem : Movie s c h e d u l i n g problem

Input : d i c t i o n a r y mapping movie t i t l e to range

times

Output : movie t i t l e s that r e t u r n s the maximal s u b s e t

non−o v e r l a p p i n g t i t l e s

S o l u t i o n : use e a r l i e s t job h e u r i s t i c to ( i n c o r r e c t l y ) c a l c u l a t e job l i s t

”””

#s t a r t by u s in g a l i s t comprehension to s o r t movie t i t l e s by s t a r t date

s t a r t t i m e s =[( movies [m] [ 0 ] ,m) f o r m i n movies ]

s t a r t t i m e s . s o r t () #s o r t () ta kes n l g n time

t i t l e s o r t =[m[ 1 ] f o r m i n s t a r t t i m e s ] #one more comprehension to get the t i t l e s

j o b l i s t =[]

#go through a l l j o b s s o r t e d by s t a r t date

f o r jobcand i n t i t l e s o r t :

v e r l a p=F a l s e

#check a l l j o b s a l r e a d y accepted f o r

v e r l a p

with the candidate job

f o r job i n j o b l i s t :

i f checkOverlap ( movies [ jobcand ] [ 0 ] , movies [ jobcand ] [ 1 ] , movies [ job ] [ 0 ] , movies [ job ] [ 1 ] ) :

v e r l a p=True

break

#i f t h e r e ’ s no

v e r l a p

with e x i s t i n g jobs , add the job candidate

i f not

v e r l a p :

j o b l i s t . append ( jobcand )

return j o b l i s t

18 / 27

POTD: Attempt parts (a) and (b) of Q1

Before class on Thursday, please attempt problem Q1 (a) and (b) You won’t turn anything in on Thursday, but I want to know if you are able to successfully code this first part of the problem It’s OK if you can’t get a working solution. In this case, bring me your errors! If you get stuck on an error, I want to know about it, so we can discuss with the class. There is a VERY good chance some of your classmates are experiencing similar trouble.

19 / 27

Dynamic Arrays

Unfortunately we cannot adjust the size of simple arrays in the middle

f a programs execution.

Compensating by allocating extremely large arrays can waste a lot of space. With dynamic arrays we start with an array of size 1, and double its size from m to 2m each time we run out of space. How many times will we double for n elements? Answer: Only ⌈lg n⌉.

20 / 27

Notes Notes Notes Notes

SLIDE 6

How Much Total Work?

The apparent waste in this procedure involves the recopying of the

ld contents on each expansion.

If half the elements move once, a quarter of the elements twice, and so on, the total number of movements M is given by M =

lg n

i · n 2i = n

lg n

i 2i ≤ n

∞

i 2i = 2n Thus each of the n elements move an average of only twice, and the total work of managing the dynamic array is the same O(n) as a simple array.

21 / 27

Advantages of Linked Lists

The relative advantages of linked lists over static arrays include:

1 Overflow on linked structures can never occur unless the memory is

actually full.

2 Insertions and deletions are simpler than for contiguous (array) lists. 3 With large records, moving pointers is easier and faster than moving

the items themselves. Dynamic memory allocation provides us with flexibility on how and where we use our limited storage resources.

22 / 27

Question

Are Python lists like dynamic arrays or linked lists?

23 / 27

Stacks and Queues

Sometimes, the order in which we retrieve data is independent of its content, being only a function of when it arrived. A stack supports last-in, first-out operations: push and pop. A queue supports first-in, first-out operations: enqueue and dequeue. Lines in banks are based on queues, while food in my refrigerator is treated as a stack.

24 / 27

Notes Notes Notes Notes

SLIDE 7

Python lists can be treated like stacks

Push: l.append() Pop: l.pop() What’s missing from list’s built-in functions to make queue’s possible? (’append’, ’count’, ’extend’, ’index’, ’insert’, ’pop’, ’remove’, ’reverse’, ’sort’)

25 / 27

Dictionary

Perhaps the most important class of data structures maintain a set of items, indexed by keys. Search(S, k) A query that, given a set S and a key value k, returns a pointer x to an element in S such that key[x] = k, or nil if no such element belongs to S. Insert(S, x) A modifying operation that augments the set S with the element x. Delete(S, x) Given a pointer x to an element in the set S,remove x from S. Observe we are given a pointer to an element x, not a key value. Min(S), Max(S) Returns the element of the totally ordered set S which has the smallest (largest) key. Next(S, x), Previous(S, x) Given an element x whose key is from a totally ordered set S, returns the next largest (smallest) element in S,

r NIL if x is the maximum (minimum) element.

There are a variety of implementations of these dictionary operations, each

f which yield different time bounds for various operations.

26 / 27

Different Ways to Implement Dictionaries

Array-based Sets: Unsorted Arrays Operation Implementation Efficiency Search(S, k) sequential search Insert(S, x) place in first empty spot Delete(S, x) copy nth item to the xth spot Min(S, x), Max(S, x) sequential search Successor(S, x), Pred(S, x) sequential search Array-based Sets: Sorted Arrays Operation Implementation Efficiency Search(S, k) binary search Insert(S, x) search, then move to make space Delete(S, x) move to fill up the hole Min(S, x), Max(S, x) first or last element Successor(S, x), Pred(S, x) add or subtract 1 from pointer

27 / 27