Algorithm Analysis Part II Tyler Moore CSE 3353, SMU, Dallas, TX - - PowerPoint PPT Presentation

algorithm analysis
SMART_READER_LITE
LIVE PREVIEW

Algorithm Analysis Part II Tyler Moore CSE 3353, SMU, Dallas, TX - - PowerPoint PPT Presentation

Algorithm Analysis Part II Tyler Moore CSE 3353, SMU, Dallas, TX Lecture 4 Some slides created by or adapted from Dr. Kevin Wayne. For more information see


slide-1
SLIDE 1

Algorithm Analysis

Part II Tyler Moore

CSE 3353, SMU, Dallas, TX

Lecture 4

Some slides created by or adapted from Dr. Kevin Wayne. For more information see http://www.cs.princeton.edu/~wayne/kleinberg-tardos. Some slides adapted from Dr. Steven Skiena. For more information see http://www.algorist.com

9

  • 2 / 43

Implications of dominance

Exponential algorithms get hopeless fast. Quadratic algorithms get hopeless at or before 1,000,000. O(n log n) is possible to about one billion.

3 / 43

Testing dominance

Definition

Dominance g(n) dominates f (n) iff limn→∞

f (n) g(n) = 0

Definition

Little oh notation f (n) is o(g(n)) iff g(n) dominates f (n). In other words, little oh means “grows strictly slower than”. Q: is 3n o(n2)? A: Yes, since limn→∞ 3n

n2 = 3 n = 0

Q: is 3n2 o(n2)? A:

4 / 43

slide-2
SLIDE 2

15

  • Proposition. If , then isΘ.
  • Pf. By definition of the limit, there exists such such that for all ≥

Thus, ≤ for all ≥, which implies is . Similarly, ≥½for all ≥, which implies is Ω.

  • Proposition. If , then is.

lim

n→∞

f(n) g(n) = c > 0 1 2 c < f(n) g(n) < 2 c lim

n→∞

f(n) g(n) = 0

5 / 43

16

  • Polynomials. Let with . Then, is Θ.

Pf.

  • Logarithms. Θ is Θ for any constants , .

Logarithms and polynomials. For every , is. Exponentials and polynomials. For every and every , is. Pf.

no need to specify base (assuming it is a constant)

lim

n→∞

a0 + a1n + . . . + adnd nd = ad > 0 lim

n→∞

nd rn = 0

6 / 43

Exercises

Using the limit formula and results from earlier slides, answer the following: Q: Is 5n2 + 3n o(n)? A: No, since limn→∞ 5n2+3n

n

= limn→∞ 5n + 3 = ∞ Q: is 3n3 + 5 Θ(n3)? A: Q: is n log n + n2 O(n3)? A:

7 / 43

19

  • Linear time. Running time is proportional to input size.

Computing the maximum. Compute maximum of numbers .

max ← a1 for i = 2 to n { if (ai > max) max ← ai }

8 / 43

slide-3
SLIDE 3

20

  • Merge. Combine two sorted lists into sorted

whole.

  • Claim. Merging two lists of size takes time.
  • Pf. After each compare, the length of output list increases by .

i = 1, j = 1 while (both lists are nonempty) { if (ai ≤ bj) append ai to output list and increment i else(ai ≤ bj)append bj to output list and increment j } append remainder of nonempty list to output list

9 / 43

21

  • O(n log n) time. Arises in divide-and-conquer algorithms.
  • Sorting. Mergesort and heapsort are sorting algorithms that perform

compares. Largest empty interval. Given time-stamps on which copies of a file arrive at a server, what is largest interval when no copies of file arrive? O(n log n) solution. Sort the time-stamps. Scan the sorted list in order, identifying the maximum gap between successive time-stamps.

10 / 43

22

  • Ex. Enumerate all pairs of elements.

Closest pair of points. Given a list of points in the plane , find the pair that is closest. O(n2) solution. Try all pairs of points.

  • Remark. Ω seems inevitable, but this is just an illusion. [see Chapter 5]

min ← (x1 - x2)2 + (y1 - y2)2 for i = 1 to n { for j = i+1 to n { d ← (xi - xj)2 + (yi - yj)2 if (d < min) min ← d } }

11 / 43

23

  • Cubic time. Enumerate all triples of elements.

Set disjointness. Given sets each of which is a subset of , is there some pair of these which are disjoint? O(n3) solution. For each pair of sets, determine if they are disjoint.

foreach set Si { foreach other set Sj { foreach element p of Si { determine whether p also belongs to Sj } if (no element of Si belongs to Sj) report that Si and Sj are disjoint } }

12 / 43

slide-4
SLIDE 4

24

  • Independent set of size k. Given a graph, are there nodes such that no

two are joined by an edge? O(nk) solution. Enumerate all subsets of nodes.

Check whether is an independent set takes time. Number of element subsets = .

foreach subset S of k nodes { check whether S in an independent set if (S is an independent set) report S is an independent set } }

poly-time for k=17, but not practical k is a constant

n k

  • = n(n − 1)(n − 2) × · · · × (n − k + 1)

k(k − 1)(k − 2) × · · · × 1 ≤ nk k!

13 / 43

25

  • Independent set. Given a graph, what is maximum cardinality of an

independent set? O(n2 2n) solution. Enumerate all subsets.

S* ← φ foreach subset S of nodes { check whether S in an independent set if (S is largest independent set seen so far) update S* ← S } }

14 / 43

26

  • Search in a sorted array. Given a sorted array of numbers, is a given

number in the array? O(log n) solution. Binary search.

lo ← 1hi ← n while (lo ≤ hi) { mid ← (lo + hi) / 2 if (x < A[mid]) hi ← mid - 1 else if (x > A[mid]) lo ← mid + 1 else return yes } return no

15 / 43

Common algorithm dominance classes

Dominance class Example problem types 1 Operations independent of input size (e.g., addition, min(x,y), etc.) log n Binary search n Operating on every element in an array n log n Quicksort, mergesort n2 Operating on every pair of items n3 Operating on every triple of items 2n Enumerating all subsets of n items n! Enumerating all orderings of n items

16 / 43

slide-5
SLIDE 5

Homework 1

Due at the beginning of class one week from today You are encouraged to work in pairs Please start on the Python coding early!

18 / 43

Selecting the Right Jobs

A movie star wants to the select the maximum number of staring roles such that no two jobs require his presence at the same time.

Tarjan of the Jungle The Four Volume Problem The President’s Algorist Steiner’s Tree Halting State "Discreet" Mathematics Calculated Bets Process Terminated Programming Challenges

16

18 / 43

The Movie Star Scheduling Problem

Input: A set I of n intervals on the line. Output: What is the largest subset of mutually non-overlapping intervals which can be selected from I ? Give an algorithm to solve the problem!

17

19 / 43

Brute-force movie-scheduling pseudo-code

ExhaustiveScheduling(I) j = 0 Smax = {} For each of the 2^n subsets Si of intervals I If (Si is mutually non-overlapping) and (size(Si)>j) then j = size(Si) and Smax = Si Return Smax

20 / 43

slide-6
SLIDE 6

Earliest Job First

Start working as soon as there is work available: EarliestJobFirst(I) Accept the earlest starting job j from I which does not overlap any previously accepted job, and repeat until no more such jobs remain.

18

21 / 43

Earliest Job First is Wrong!

The first job might be so long (W ar and Peace) that it prevents us from taking any other job.

19

22 / 43

Shortest Job First

Always take the shortest possible job, so you spend the least time working (and thus unavailable). ShortestJobFirst(I) While (I ≠ ∅ ) do Accept the shortest possible job j from I . Delete j , and intervals which intersect j from I .

20

23 / 43

Shortest Job First is Wrong!

Taking the shortest job can prevent us from taking two longer jobs which barely overlap it.

21

24 / 43

slide-7
SLIDE 7

First Job to Complete

Take the job with the earliest completion date: OptimalScheduling(I) While (I ≠ ∅ ) do Accept job j with the earliest completion date. Delete j , and whatever intersects j from I .

22

The President’s Algorist Steiner’s Tree Halting State "Discreet" Mathematics Calculated Bets Process Terminated Programming Challenges

Tarjan of the Jungle The Four Volume Problem

25 / 43

First Job to Complete is Optimal!

Why should you believe me?

  • Other jobs may well have started before the first to complete

(x), but all must at least partially overlap each other.

  • Thus we can select at most one from the group.
  • The first these jobs to complete is x, so the rest can only block
  • ut more opportunties to the right of x.

23

26 / 43

Homework 1: Python data structures to represent intervals

Starter code: http://lyle.smu.edu/~tylerm/courses/cse3353/ code/job_selection_starter.txt

1 movieTimes={ 2

” Tarjan

  • f

the Jungle ” :

3

( datetime . date (2013 ,3 ,1) , datetime . date (2013 ,10 ,15)) ,

4

”The P r e s i d e n t ’ s A l g o r i s t ” :

5

( datetime . date (2013 ,1 ,1) , datetime . date (2013 ,7 ,15)) ,

6

” ’ D i s c r e e t ’ Mathematics ” :

7

( datetime . date (2013 ,1 ,15) , datetime . date (2013 ,5 ,15)) ,

8

” H a l t i n g State ” :

9

( datetime . date (2013 ,7 ,1) , datetime . date (2013 ,11 ,30)) ,

10

” S t e i n e r ’ s Tree ” :

11

( datetime . date (2013 ,9 ,1) , datetime . date (2014 ,1 ,15)) ,

12

”The Four Volume Problem” :

13

( datetime . date (2013 ,12 ,15) , datetime . date (2014 ,6 ,30)) ,

14

”Programming C h a l l e n g e s ” :

15

( datetime . date (2014 ,2 ,1) , datetime . date (2014 ,6 ,15)) ,

16

” Process Terminated ” :

17

( datetime . date (2014 ,5 ,1) , datetime . date (2014 ,10 ,15)) ,

18

” C a l c u l a t e d Bets ” :

19

( datetime . date (2014 ,6 ,25) , datetime . date (2014 ,11 ,15))

20 } 27 / 43

Homework 1: earliestJobFirst pseudo-code

EarliestJobFirst(I) Accept the earliest starting job j from I which does not

  • verlap any previously accepted job, and repeat until no

more such jobs remain

28 / 43

slide-8
SLIDE 8

Homework 1: earliestJobFirst Python code

1 def

e a r l i e s t J o b F i r s t ( movies ) :

2

”””

3

Problem : Movie s c h e d u l i n g problem

4

Input : d i c t i o n a r y mapping movie t i t l e to range

  • f

times

5

Output : movie t i t l e s that r e t u r n s the maximal s u b s e t

  • f

non−o v e r l a p p i n g t i t l e s

6

S o l u t i o n : use e a r l i e s t job h e u r i s t i c to ( i n c o r r e c t l y ) c a l c u l a t e job l i s t

7

”””

8

#s t a r t by u s in g a l i s t comprehension to s o r t movie t i t l e s by s t a r t date

9

s t a r t t i m e s =[( movies [m] [ 0 ] ,m) f o r m i n movies ]

10

s t a r t t i m e s . s o r t () #s o r t () tak es n l g n time

11

t i t l e s o r t =[m[ 1 ] f o r m i n s t a r t t i m e s ] #one more comprehension to get the t i t l e s

  • nly

12

j o b l i s t =[]

13

#go through a l l j o b s s o r t e d by s t a r t date

14

f o r jobcand i n t i t l e s o r t :

15

  • v e r l a p=F a l s e

16

#check a l l j o b s a l r e a d y accepted f o r

  • v e r l a p

with the ca nd id ate job

17

f o r job i n j o b l i s t :

18

i f checkOverlap ( movies [ jobcand ] [ 0 ] , movies [ jobcand ] [ 1 ] , movies [ job ] [ 0 ] , movies [ job ] [ 1 ] ) :

19

  • v e r l a p=True

20

break

21

#i f t h e r e ’ s no

  • v e r l a p

with e x i s t i n g jobs , add the job can d id ate

22

i f not

  • v e r l a p :

j o b l i s t . append ( jobcand )

23

return j o b l i s t

29 / 43

Your task: Attempt parts (a) and (b) of Q1

Before next class, please attempt problem Q1 (a) and (b) You won’t turn anything in next class, but I want to know if you are able to successfully code this first part of the problem It’s OK if you can’t get a working solution. In this case, bring me your errors! If you get stuck on an error, I want to know about it, so we can discuss with the class. There is a VERY good chance some of your classmates are experiencing similar trouble.

30 / 43

Python Algorithm Development Process

1 Think hard about the problem you’re trying to solve. Specify the

expected inputs for which you’d like to provide a solution, and the expected outputs.

2 Describe a method to solve the problem using English and/or

pseudo-code

3 Start coding 1

Development/Debugging phase

2

Testing phase (for correctness)

3

Evaluation phase (performance)

Let’s use the insertion sort as an example of the development process in Python

31 / 43

Debugging in Python

1 Main strategy: run code in the interpreter to get instant feedback on

errors

2 Backup: Generous use of print statements 3 Once code is running in functions: pdb.pm() (Python debugger

post-mortem)

32 / 43

slide-9
SLIDE 9

Main strategy: run code in the interpreter

>>> s = [2,7,4,5,9] >>> >>> for i in range(s): ... minidx = i ... for j in range(i,len(s)): ... if s[j]<s[minidx]: ... minidx=i ... s[i],s[minidx]=s[minidx],s[i] ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: range() integer end argument expected, got list. >>> s [2, 7, 4, 5, 9] >>> range(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: range() integer end argument expected, got list. >>> len(s) 5 >>> range(len(s)) [0, 1, 2, 3, 4]

33 / 43

Second strategy: print variables out during execution

>>> for i in range(len(s)): ... minidx = i ... for j in range(i,len(s)): ... print ’list: %s, i: %i, j: %i, minidx: %i’%(s,i,j,minidx) ... if s[j]<s[minidx]: ... print "reassigning minidx %i < %i" %(s[j],s[minidx]) ... minidx=j ... s[i],s[minidx]=s[minidx],s[i] ... list: [2, 7, 4, 5, 9], i: 0, j: 0, minidx: 0 list: [2, 7, 4, 5, 9], i: 0, j: 1, minidx: 0 list: [2, 7, 4, 5, 9], i: 0, j: 2, minidx: 0 list: [2, 7, 4, 5, 9], i: 0, j: 3, minidx: 0 list: [2, 7, 4, 5, 9], i: 0, j: 4, minidx: 0 list: [2, 7, 4, 5, 9], i: 1, j: 1, minidx: 1 list: [2, 7, 4, 5, 9], i: 1, j: 2, minidx: 1 reassigning minidx 4 < 7 list: [2, 4, 7, 5, 9], i: 1, j: 3, minidx: 2 reassigning minidx 5 < 7 list: [2, 5, 7, 4, 9], i: 1, j: 4, minidx: 3 list: [2, 5, 7, 4, 9], i: 2, j: 2, minidx: 2 list: [2, 5, 7, 4, 9], i: 2, j: 3, minidx: 2 reassigning minidx 4 < 7 list: [2, 5, 4, 7, 9], i: 2, j: 4, minidx: 3 list: [2, 5, 4, 7, 9], i: 3, j: 3, minidx: 3 list: [2, 5, 4, 7, 9], i: 3, j: 4, minidx: 3 list: [2, 5, 4, 7, 9], i: 4, j: 4, minidx: 4

34 / 43

Second strategy: print variables out during execution

>>> for i in range(1,len(s)): ... minidx = i ... for j in range(i+1,len(s)): ... print ’list: %s, i: %i, j: %i, minidx: %i’%(s,i,j,minidx) ... if s[j]<s[minidx]: ... print "reassigning minidx %i < %i" %(s[j],s[minidx]) ... minidx=j ... s[i],s[minidx]=s[minidx],s[i] ... list: [2, 7, 4, 5, 9], i: 1, j: 2, minidx: 1 reassigning minidx 4 < 7 list: [2, 7, 4, 5, 9], i: 1, j: 3, minidx: 2 list: [2, 7, 4, 5, 9], i: 1, j: 4, minidx: 2 list: [2, 4, 7, 5, 9], i: 2, j: 3, minidx: 2 reassigning minidx 5 < 7 list: [2, 4, 7, 5, 9], i: 2, j: 4, minidx: 3 list: [2, 4, 5, 7, 9], i: 3, j: 4, minidx: 3

35 / 43

Third strategy: use Python debugger

Once you’ve gotten rid of the obvious bugs, move the code to a function. But what happens if you start getting run-time errors on different inputs? You can copy code directly into the interpreter Or you can run pdb.pm() to access variables in the environment at the time of the error

36 / 43

slide-10
SLIDE 10

After debugging comes testing

While you might view them as synonyms, testing is more systematic checking that algorithms work for a range of inputs, not just the ones that cause obvious bugs Use Python assert command to verify expected behavior

37 / 43

assert in action

>>> s [2, 5, 4, 7, 9] >>> t = list(s) >>> t.sort() >>> >>> assert t == s Traceback (most recent call last): File "<stdin>", line 1, in <module> AssertionError >>> t [2, 4, 5, 7, 9] >>> s [2, 5, 4, 7, 9]

38 / 43

Using random to generate inputs

>>> import random, timeit >>> l10=range(10) >>> l10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> random.shuffle(l10) >>> l10 [4, 2, 0, 3, 8, 1, 9, 7, 6, 5] >>> unsortl10 = list(l10) >>> unsortl10 [4, 2, 0, 3, 8, 1, 9, 7, 6, 5] >>> l10.sort() >>> l10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> unsortl10 [4, 2, 0, 3, 8, 1, 9, 7, 6, 5] >>> assert selection_sort(unsortl10) == l10

39 / 43

Using assert on many inputs

#try 10 different shufflings of each list for i in range(10): #try all lists between 1 and 500 elements print ’trying %i time’%(i) for j in range(500): l = range(j) random.shuffle(l) #reorder the list ul = list(l) #make a copy of the unordered list l.sort() #do a known correct sort assert selection_sort(ul) == l #compare sorts

40 / 43

slide-11
SLIDE 11

Don’t forget to look for counterexamples

Using assert works when you have a known correct solution to compare against This frequently occurs when you have a known working algorithm, but you are developing a more efficient one While testing lots of random inputs is a good strategy, don’t forget to examine edge cases and potential counterexamples too

41 / 43

Empirically evaluating performance

Once you are confident that your algorithm is correct, you can evaluate its performance empirically Python’s timeit package repeatedly runs code and reports average execution time timeit arguments

1

code to be executed in string form

2

any setup code that needs to be run before executing the code (note: setup code is only run once)

3

parameter ‘number’, which indicates the number of times to run the code (default is 1000000)

42 / 43

Timeit in action: timing Python’s sort function and our selection sort

#store function in file called sortfun.py import random def sortfun(size): l = range(1000) random.shuffle(l) l.sort() >>> timeit.timeit("sortfun(1000)","from sortfun import sortfun",number=100) 0.0516510009765625 >>> #here is the wrong way to test the built-in sort function ... timeit.timeit("l.sort()","import random; l = range(1000); random.shuffle(l)" ,number=100) 0.0010929107666015625 >>> #let’s compare it to our selection sort >>> timeit.timeit("selection_sort(l)","from selection_sort import selection_sort; import random; l = range(1000); random.shuffle(l)",number=100) 3.0629560947418213

43 / 43